DATA CENTER SERVER PROVISION: DISTRIBUTED ASYNCHRONOUS CONTROL FOR COUPLED RENEWAL SYSTEMS
ABSTRACT
This paper considers a cost minimization problemfor data centers with N servers and randomly arriving servicerequests. A central router decides which server to use for eachnew request. Each server has three types of states (active, idle,setup) with different costs and time durations. The serversoperate asynchronously over their own states and can chooseone of multiple sleep modes when idle. We develop an onlinedistributed control algorithm so that each server makes its owndecisions, the request queues are bounded and the overall timeaverage cost is near optimal with probability 1. The algorithmdoes not need probability information for the arrival rate orjob sizes. Next, an improved algorithm that uses a single queueis developed via a “virtualization” technique which is shown toprovide the same (near optimal) costs. Simulation experimentson a real data center traffic trace demonstrate the efficiency ofour algorithm compared to other existing algorithms.
PROPSOSED SYSTEM:
On the theoretical side, the current paper is the first toconsider the stochastic control problem with heterogeneousservers and multiple idle and setup states at each server. Thisgives rise to a nonstandard asynchronous problem with coupledMarkov decision systems. A novel aspect of the solutionis the construction of a process with super-martingale propertiesby piecing together the asynchronous processes at eachserver. This is interesting because neither the individual serverprocesses nor their asynchronous sum are super-martingales.The technique yields a simple distributed algorithm that canlikely be applied more broadly for other coupled stochasticMDPs.On the practical side, we run our algorithm on a real datacenter traffic trace with server setup time on the order ofseconds. Simulation experiments show that our algorithm iseffective compared to several existing algorithms
EXISTING SYSTEM:
In this paper, we use renewal-reward theory together with Lyapunov optimization todevelop a simple implementable algorithm for this problem.Our work is not alone in approaching the problem this way.Work in uses Lyapunov theory for a system with onerenewal server, while work in considers a multi-serversystem but in a deterministic context. The work in developsa two stage algorithm for stochastic multi-server systems, butthe first stage must be solved offline. This paper is distinct inthat we overcome the open challenges of previous work anddevelop a near optimal fully online algorithm for a stochasticsystem with multiple servers.Several alternative approaches to multi-server systems usequeueing theory. Specifically, the work in treats themulti-server system as an M=M=k=setup queue, whereas thework in considers modeling a single server system as amulti-class M=G=1 queue. These approaches require Poissontraffic and do not treat the asynchronous control problem withmultiple sleep options considered in the current paper.Experimental work on power and delay minimization istreated in, which proposes to turn each server ON andOFF according to the rule of an M=M=k=setup queue. Thework in applies Lyapunov optimization to optimize powerin virtualized data centers. However, it assumes each serverhas negligible setup time and that ON/OFF decisions aremade synchronously at each server. The works focus on power-aware provisioning over a time scale largeenough so that the whole data center can adjust its servicecapacity. Specifically, considers load balancing acrossgeographically distributed data centers, and considersprovisioning over a finite time interval and introduces anonline 3-approximation algorithm.
CONCLUSIONS
This paper proposes an efficient distributed asynchronouscontrol algorithm reducing the cost in a data center, wherethe front-end load balancer makes slot-wise routing requeststo the shortest queue and each server makes frame-basedservice decision by only looking at its own request queue.Theoretically, this algorithm is shown to achieve the nearoptimal cost while stabilizing the request queues. Simulationexperiments on a real data center traffic trace demonstrates thatour algorithm outperforms several other algorithms in reducingthe power consumption as well as achieving lower delays.
REFERENCES
[1] D. P. Bertsekas. Dynamic Programming and Optimal Control, vols. 1and 2. Athena Scientific, Belmont, Mass, 1995.
[2] M. L. Puterman. Markov Decision Processes: Discrete StochasticDynamic Programming. John Wiley & Sons, 2005.
[3] S. Ross. Introduction to Probability Models. Academic Press, 8thedition, 2002.
[4] R. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers,Boston, 1996.
[5] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation andcross-layer control in wireless networks. Foundations and Trends inNetworking, vol. 1, no. 1, pp. 1-149, 2006.
[6] M. J. Neely. Stochastic Network Optimization with Application toCommunication and Queueing Systems. Morgan & Claypool, 2010.
[7] M. J. Neely, Dynamic Optimization and Learning for Renewal Systems,IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 32-46, Jan.2013.
[8] M. J. Neely, Asynchronous Scheduling for Energy Optimality inSystems with Multiple Servers, Proc. 46th Annual Conf. on InformationSciences and Systems (CISS), March 2012.
[9] M. J. Neely, Online Fractional Programming for Markov Decision Systems,Proc. Allerton Conf. on Communication, Control, and Computing,Sept. 2011.
[10] A. Gandhi, S. Doroudi, M. Harchol-Balter, A. Scheller-Wolf. Exactanalysis of the M/M/k/setup class of Markov chains via recursiverenewal reward, Proc. ACM Sigmetrics, pp.153-166, 2013.
[11] D. D. Yao. Dynamic Scheduling via Polymatroid Optimization, ProceedingPerformance Evaluation of Complex Systems: Techniques andTools, pp. 89-113, 2002.