Renewable Energy-Aware Big Data Analytics in Geo-distributed Data Centers with Reinforcement Learning

Abstract:

In the age of big data, companies tend to deploy their services in data centers rather than their own servers. The demands of big data analytics grow significantly, which leads to an extremely high electricity consumption at data centers. In this paper, we investigate the cost minimization problem of big data analytics on geo-distributed data centers connected to renewable energy sources with unpredictable capacity. To solve this problem, we propose a Reinforcement Learning (RL) based job scheduling algorithm by combining RL with neural network (NN). Moreover, two techniques are developed to enhance the performance of our proposal. Specifically, Random Pool Sampling (RPS) is proposed to retrain the NN via accumulated training data, and a novel Unidirectional Bridge Network (UBN) structure is designed for further enhancing the training speed by using the historical knowledge stored in the trained NN. Experiment results on real Google cluster traces and electricity price from Energy Information Administration show that our approach is able to reduce the data centers’ cost significantly compared with other benchmark algorithms.

Existing System:

We investigate a renewable energy-aware job scheduling issue in geo-distributed data centers based on streaming big data analytics. Particularly, we take a set of steaming big data jobs into consideration, each of which runs on a cluster of virtual machines accommodated in several geo-distributed data centers connected to both traditional power grid and renewable energy sources with unpredictable capacity [12]. When more renewable energy has been generated at a data center due to favorable weather conditions, migrating big data jobs to this data center could decrease energy consumption from power grid under an incurred migration overhead. This cost might be high when migration meets network traffic congestion. Our proposal is designed for minimizing the total cost of energy consumption from grid and job migration without any knowledge of future renewable energy generation

Proposed System:

we propose a novel job scheduling algorithm based on reinforcement learning (RL) [13] that can approximate the optimal solution by iteratively learning the feedback from historical job scheduling decisions (i.e., job locations in different time intervals), which are also referred to as actions. The learning process of RL consists of a sequence of actions and the corresponding rewards. In each iteration, RL maintains a value function to evaluate the expected effect of taking different actions. When the action selected by value function is applied, RL observes the state that appears thereafter, e.g., the current generated renewable energy and the data center load, the reward associated with this state that can be used to refine its value function. Although RL is a promising approach, the following challenges need to be addressed on how to quickly approximate the optimal solution for streaming big data analytics.

Conclusion:

In this paper, we have investigated a big data scheduling problem for reducing the cost of geo-distributed data centers. An RL based job scheduling algorithm is proposed with NN, and two techniques are developed to enhance the performance of our proposal. Specifically, we propose RPS to retrain the NN via accumulated training data, and design a novel UBN structure for further enhancing the training speed. Extensive experiments show that our proposal is able to reduce the data centers’ cost significantly compared with some benchmark algorithms. Motivated by this paper, there are many interesting directions that can be studied. With respect to the design of parallel algorithms, one aspect is to use multiple processes to accelerate the learning speed of job scheduler. Another aspect is to investigate the combination of ensemble learning and reinforcement learning for rapid deployment. In the field of decentralized distributed systems, multi-agent reinforcement learning is an aspect can be studied. As for the exploration policy, our approach uses simple greedy algorithm based on probability. However, exploration policy is also important in practice. Further studies will focus on reducing the cost of cloud service.

REFERENCES

[1] K. Wang, Y. Wang, X. Hu, Y. Sun, D.-J. Deng, A. Vinel, and Y. Zhang, “Wireless big data computing in smart grid,” IEEE Wireless Communications, Vol. 24, No. 2, Apr. 2017, pp. 58-64.

[2] P. Li, S. Guo, T. Miyazaki, M. Xie, J. Hu, W. Zhuang, “Wireless big data computing in smart grid,” IEEE Cloud Computing, Vol. 3, No. 5, Sept. 2016, pp. 34-42.

[3] Wholesale Electricity and Natural Gas Market Data, [Online]. Available: http://www.eia.gov/electricity/wholesale/

[4] C.-M. Wu, R.-S. Chang, and H.-Y. Chan, “A green energy-efficient scheduling algorithm using the dvfs technique for cloud datacenters,” Future Generation Computer Systems, Vol. 37, No. 7, Jul. 2014, pp. 141-147.

[5] J. Heo, D. Henriksson, X. Liu, and T. Abdelzaher, “Integrating adaptive components: an emerging challenge in performanceadaptive systems and a server farm case-study,” in Proceedings of IEEE International Real-Time Systems Symposium, 2007, pp. 227-238.

[6] Z. Liu, M. Lin, A. Wierman, S. Low, and L. L. H. Andrew, “Greening geographical load balancing,” IEEE/ACM Transactions on Networking, Vol. 23, No. 2, Apr. 2015, pp. 657-671.

[7] L. Rao, X. Liu, L. Xie, and W. Liu, “Minimizing electricity cost: Optimization of distributed internet data centers in a multielectricitymarket environment,” in Proceedings of IEEE INFOCOM, Mar. 2010, pp. 1-9.

[8] C. Xu, K. Wang, G. Xu, P. Li, S. Guo, and J. Luo, “Making big data open in collaborative edges: a blockchain-based framework with reduced resource requirements,” in proceedings of IEEE International Conference on Communications, May 20-24, 2018, Kansas City, MO, USA.

[9] P. Li, S. Guo, T. Miyazaki, X. Liao, H. Jin, A. Y. Zomaya, and K. Wang, “Traffic-aware geo-distributed big data analytics with predictable job completion time,” IEEE Transactions on Parallel and Distributed Systems, Vol. 28, No. 6, 2017, pp. 1785-1796.

[10] X. Zhou, K. Wang, W. Jia, and M. Guo, “Reinforcement learning based adaptive resource management of differentiated services in geo-distributed data centers,” in Proc. IWQoS, Vilanova, 2017.

[11] C. Xu, K. Wang, and M. Guo, “Intelligent resource management in blockchain-based cloud datacenters,” IEEE Cloud Computing, Vol. 4, No. 6, 2017, pp. 5059.

[12] K. Wang, H. Li, Y. Feng, and G. Tian, “Big data analytics for system stability evaluation strategy in the Energy Internet,” IEEE Transactions on Industrial Informatics, Vol. 13, No. 4, pp. 1969-1978, Aug. 2017.

[13] Y. J. Liu, L. Tang, S. Tong, C. L. P. Chen, and D. J. Li, “Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems,” IEEE Transactions on Neural Networks and Learning Systems, Vol. 26, No. 1, pp. 165-176, Jan. 2015.

[14] X. He, K. Wang, T. Miyazaki, H. Huang, Y. Wang, and S. Guo, “Green resource allocation based on deep reinforcement learning in content-centric IoT,” IEEE Transactions on Emerging Topics in Computing, Vol. PP, No. 99, pp. 1-16, Feb. 2018.

[15] X. Lin, Y. Wang, and M. Pedram, “A reinforcement learningbased power management framework for green computing data centers,” in Proceedings of IEEE International Conference on Cloud Engineering, 2016, pp. 135-138.