An Online Algorithm for Combined Computing Workload and Energy Coordination Within A Regional Data Center Cluster
Shihan Huang, Dongxiang Yan, Yue Chen

TL;DR
This paper introduces a prediction-free, Lyapunov optimization-based online algorithm for coordinating workload and energy in regional data center clusters, ensuring physical limits and near-optimal performance.
Contribution
It develops a novel online algorithm with performance guarantees and a distributed implementation method for energy and workload coordination in data centers.
Findings
The algorithm guarantees workload and energy levels within physical limits.
It provides a theoretical bound on the performance gap compared to offline solutions.
Case studies demonstrate improved efficiency over existing methods.
Abstract
Regional data center clusters have flourished in recent years to serve customers in a major city with low latency. The optimal coordination of data centers in a regional cluster has become a pressing issue because of its rising energy consumption. In this paper, a Lyapunov optimization-based online algorithm is developed for the combined computing workload and energy coordination of data centers in a regional cluster. The proposed online algorithm is prediction-free and easy to implement. We prove that the workload queues and battery energy level will be within their physical limits, though their related time-coupling constraints are not considered explicitly in the proposed algorithm. The previous online algorithms do not have such a guarantee. A theoretical upper bound on the optimality gap between the online and offline results is derived to provide a performance guarantee for the…
| Relative value of | ||||||
|---|---|---|---|---|---|---|
| B1 | 0.002 | -0.026 | 1.178 | 4.660 | 5.813 | 100% |
| B2 | 0.441 | 2.105 | 2.217 | 3.102 | 7.864 | 135% |
| B3 | 0.045 | 1.556 | 1.509 | 4.171 | 7.281 | 125% |
| Proposed | 0.037 | 1.412 | 1.501 | 4.182 | 7.132 | 123% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Queuing Theory Analysis · Software-Defined Networks and 5G
An Online Algorithm for Combined Computing Workload and Energy Coordination Within A Regional Data Center Cluster
Shihan Huang
Dongxiang Yan
Yue Chen
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China.
Abstract
Regional data center clusters have flourished in recent years to serve customers in a major city with low latency. The optimal coordination of data centers in a regional cluster has become a pressing issue because of its rising energy consumption. In this paper, a Lyapunov optimization-based online algorithm is developed for the combined computing workload and energy coordination of data centers in a regional cluster. The proposed online algorithm is prediction-free and easy to implement. We prove that the workload queues and battery energy level will be within their physical limits, though their related time-coupling constraints are not considered explicitly in the proposed algorithm. The previous online algorithms do not have such a guarantee. A theoretical upper bound on the optimality gap between the online and offline results is derived to provide a performance guarantee for the proposed algorithm. To enable distributed implementation, an accelerated ADMM algorithm is developed with iteration truncation and follow-up well-designed adjustments, whereby a nearly optimal solution is attained with much enhanced computational efficiency. Case studies show the effectiveness of the proposed method and its advantages over the existing methods.
keywords:
data center , Lyapunov optimization , energy sharing , distributed coordination , online algorithm
NOMENCLATURE
Abbreviations
[TABLE]
Indices and Sets
[TABLE]
Parameters
[TABLE]
Decision Variables
[TABLE]
1 Introduction
Recent decades have witnessed the boom of data centers (DCs) due to the information explosion. The installed base and the storage capacity of DCs globally were increased by 30% and 26 times, respectively, during 2010-2018 [1]. Regional DC clusters have become an inevitable trend in order to serve customers in major cities with low-latency [2]. DCs are expected to be the largest energy consumers in 2030, contributing to 8% of the global energy consumption [3]. To reduce the potential carbon emissions coming along, renewable generators such as photovoltaic (PV) panels have been installed in many DCs [4]. Most DCs are also equipped with energy storage to maintain a reliable supply of electricity. However, the supply-demand mismatch, caused by the significant spatial and temporal variation of computing workloads and renewable generations, cannot be fully offset even by battery storage. To reduce the consequent efficiency loss, the combined computing workload and energy coordination of DCs is necessary. Being clustered in a city level, regional DCs can not only share computing tasks with each other but also share energy to reduce their overall operation cost, which is the focus of this paper.
The coordination of DCs has captured great attention in recent years. To deal with the underlying uncertainties, stochastic optimization was used to minimize the expected operation costs of DCs based on a number of scenarios sampled from real data portfolio. References [5] and [6] focused on the day-ahead planning and real-time operation of DCs, respectively. The economic dispatch of power system with DCs was studied in [7], whereby the two-stage stochastic problem was solved by Benders decomposition. Robust optimization is another method of addressing uncertainties, by minimizing the worst-case DC operation cost [8, 9]. Despite the fruitful research, the references above adopted offline models, which assume complete information over the whole time horizon. However, the accurate data of future electricity prices, renewable generations, and computing workloads are hardly available in practice. For actual implementation, the coordinated operation of DCs needs to be based on the up-to-date information without future predictions. Therefore, an online coordination method is necessary.
As a classic online optimization technique, model predictive control (MPC) has been applied in dynamic provisioning and load balancing in DCs with workloads assumed to be known in a look-ahead window [10]. The trade-off between the residential and migration costs of load balancing was handled by a randomized algorithm based on MPC technique in [11]. An MPC-based control strategy was developed for DCs to participate in demand response programs [12]. The MPC method is easy to implement, but it still requires certain predictions about the future. A large window is usually necessary to ensure optimality. Meanwhile, it may suffer from heavy computational burdens.
Lyapunov optimization is another online optimization technique that can be applied without prediction [13]. An event-triggered mechanism was proposed to control a regional integrated energy system by Lyapunov optimization in [14], which reduced the decision-making frequency of the system. Lyapunov optimization was used in dynamic voltage and frequency scaling to promote the use of renewable energy, wherein an improvement in the service delay and deadline misses was shown [15]. Novel virtual queues of workloads were designed in [16], by which the worst-case service delay was bounded. However, the previous research rarely considered the limitation on the backlog of workloads, which is determined by the storage capacity of DC. This is partly due to difficulty in applying Lyapunov optimization when the backlog limitation is considered, which this paper aims to overcome. Moreover, the works above did not consider the energy sharing among different DCs. In particular, the DCs in a regional cluster are located close to each other. By selling surplus energy to other DCs with energy shortage, the supply-demand mismatch in the region can be reduced and the overall operation efficiency of the regional cluster can be improved [17].
To facilitate energy sharing, a distributed implementation framework is needed for the purpose of privacy protection and the reduction of computational burdens. Several distributed algorithms have been developed for energy sharing among DCs [18], charging stations [19], microgrids [20], etc. One of the most widely-used algorithms is the alternating direction method of multipliers (ADMM) algorithm. However, a key deficiency of ADMM is its slow convergence, especially when the operational time scale is small. The convergence performance of ADMM was improved in some literature, e.g. by over-relaxation [21] or by parallelization [22]. Reference [23] tried to accelerate the ADMM algorithm by using a predictor-corrector technique. Unfortunately, these approaches all have their own shortcomings. For example, it may be tricky to tune the parameters of over-relaxation method for optimality. When dealing with complex DC systems with fast-changing renewable generations, prediction and correction may still be too time-consuming.
This paper aims to develop a combined computing workload and energy coordination mechanism for DCs in a regional cluster, established in an online manner and able to adapt to the fast-changing environment. The computing workloads are properly allocated to the DCs, and the DCs in a regional cluster can share energy with each other. Our main contributions are two-fold:
-
A prediction-free online algorithm. A Lyapunov optimization-based algorithm is developed for the online combined computing workload and energy coordination of DCs. The proposed algorithm is prediction-free, meaning that the coordination decisions in each period are made based solely on up-to-date information, which is more practical. Distinct from the other existing online algorithms, a novel virtual queue is designed and used in the proposed algorithm. This allows our algorithm to ensure that the workload backlogs and battery energy level remain within their physical ranges, even though those range constraints are relaxed in the online algorithm. To the best of our knowledge, the current works do not have such a guarantee. An upper bound on the optimality gap between the online and offline algorithms is also derived.
-
An accelerated ADMM algorithm. For the distributed implementation of the online coordination algorithm above, an accelerated ADMM algorithm is developed. To be specific, the ADMM algorithm is sped up by stopping the iteration when a certain threshold is reached, followed by a set of well-designed adjustments to balance the computing workload and energy. The proposed algorithm can achieve a nearly optimal solution with much enhanced computational efficiency. Simulation results show that the computational time is reduced by 61% compared to the traditional ADMM algorithm. This is critical to the operation of DC system where parameters such as renewable generations are likely to change very quickly.
The rest of this paper is organized as follows. Section 2 introduces the problem formulation. An online algorithm based on Lyapunov optimization is proposed in Section 3, and an accelerated ADMM-based distributed coordination mechanism is developed in Section 4. Section 5 presents the simulation results. Finally, the conclusions are drawn in 6.
2 Mathematical Formulation
2.1 System Overview
In this paper, we consider the coordination of a regional cluster of DCs, which are allowed to purchase electricity from the power grids and exchange energy with each other to serve the computing workloads. An illustration of the workload and energy flows among DCs is shown in Figure 1, consisting of the three elements below:
- •
Front Ends (mapping nodes, MNs), which accept computing workload demands and distribute them to different back ends for further processing.
- •
Back Ends (DCs), which process workloads transferred from the front ends. Each back end is equipped with PV panels and battery energy storage. It can purchase electricity from the power grid and other DCs in the same regional cluster, use the power generated by its PV panels and energy storage to obtain the electricity needed to process the computing workloads. Surplus energy can be sold back to the power grid, used to charge the battery energy storage or shared with other DCs in the same regional cluster.
- •
The Power Grid, which provides electricity and buys back when necessary. The power grid operator also serves as a coordinator of the energy sharing among back ends.
As discussed above, due to the heterogeneity of renewable power generations and computing workloads, as well as the underlying uncertainties, online coordination of data centers is a must. In the following section, we start with an offline and centralized operation model for all DCs, then an online algorithm and a distributed coordination mechanism are developed in Sections 3 and 4, respectively.
2.2 Modeling of Data Center
Suppose there are back ends and front ends, and the time interval investigated is divided into time slots. The data center related modelings are as follows.
2.2.1 Workload Flow Related Modeling
Denote the set of back ends linked to the front end and the set of front ends linked to the back end by and , respectively. For simplicity, we only consider non-critical requests, that is, part of the workloads arriving at the cluster can be rejected by the DCs, which will be either redirected to other clusters or ignored. The workloads accepted by front ends will first be pushed into queues and then be transferred to other queues at back ends, waiting for processing. All the workloads queues follow the first-in-first-out principle at front ends and back ends. The workload queues at the back end and the front end are denoted by and , respectively. For we have
[TABLE]
where is the workloads accepted by the front end at the time slot , is the workloads transferred from the front end to the back end , is the workloads that the back end process. The workload queues are initialized to zero, i.e., , . Due to the physical limitation of DCs, and are bounded, i.e.,
[TABLE]
where is the workload request at the front end , and are the upper bounds of and .
Similarly, the workload queues should be also bounded:
[TABLE]
where and are the upper bounds of and respectively.
Transferring workload from the front end to the back end causes bandwidth cost, which is proportional to the amount , i.e.,
[TABLE]
where is the corresponding cost coefficient. The income of the front end is determined by the workloads it takes in the time slot . Equivalently, we can use the disutility caused by rejecting part of the workloads for measurement:
[TABLE]
where is the disutility for not processing one unit of workload; is the workload request arrived at the front end , i.e., the maximum workloads front end can take.
2.2.2 Energy Flow Related Modeling
In the time slot , the back end buys electricity from the grids at a price and sells back to the grids at a price , where . Denote the amount of electricity bought and sold as and respectively, then the net cost of buying/selling electricity is:
[TABLE]
Apart from the energy trading with the power grid, a back end can also charge or discharge its battery energy storage. Let and be the charging and discharging power of battery energy storage in back end , respectively. Then,
[TABLE]
where and are the maximum charging and discharging power respectively. The energy loss caused by the charging and discharging of the battery energy storage is proportional to the charging and discharging power:
[TABLE]
where is the battery cost coefficient.
The energy levels of battery in two consecutive time slots satisfy:
[TABLE]
where and are the charging and discharging efficiency respectively; is the energy level of battery at the back end in the time slot , bounded by:
[TABLE]
Since all the DCs are located in a regional cluster, they can share energy with one another. Denote the energy sold by the back end to the back end by , which satisfies the following constraints:
[TABLE]
where denotes the set of all back ends except the back end , is the maximum energy that can be shared. means that the back end buys energy from the back end . Otherwise, if , the back end sells to the back end . Therefore, the shared energy satisfies the coupling constraint (13a). Suppose the energy sharing price between back ends is , with to encourage local energy exchange.
The power balance condition of the back end in the time slot is
[TABLE]
where is the PV generation in the back end .
The total costs of the overall system in the time slot is:
[TABLE]
It is worth noting that the cost of energy sharing among back ends do not appear in (15) because the payment and income of back ends cancel each other according to (13a).
2.3 Offline and Centralized Operation of Data Centers
With the above constraints and objective, the offline and centralized operation of the overall system is formulated as:
[TABLE]
where the objective (16) aims to minimize the time-average expected total costs of the overall system.
Although P1 is a linear program, it requires prior knowledge of electricity price , , workload demands , and PV generations over the whole time horizon, which is difficult to obtain in practice. Therefore, an online counterpart that enables real-time decision-making is desired. In the following section, we propose an online algorithm for P1.
Before we proceed, we make the following assumptions:
A1: ,
A2: ,
A3: ,
A4: ,
where
[TABLE]
A1–A3 are mild assumptions that the capacity of front end queues, back end queues, and the battery are large enough. For example, if , A3 is satisfied if we have , which means that the battery can be charged at maximum power for two consecutive time slots. Similarly, if , A3 holds if the battery can be discharged at maximum power for two consecutive time slots. Either of the cases can be easily met. A4 is natural which indicates that buying electricity from the grid, storing it in energy storage, and then selling it is not as good as not buying power at all, even if we sell at a higher price and buy at a lower price . This assumption is to avoid unnecessary charging and discharging.
3 Online Algorithm
The main difficulty in developing an online counterpart lies in the time-coupling constraints (1a), (1b) and (11) in P1. In this section, we discuss how the time-coupling constraints can be equivalently removed by means of Lyapunov optimization. Then, by minimizing the upper bound of the drift-plus-penalty term, an online algorithm for solving P1 can be derived.
3.1 Problem Modification
To adapt to the Lyapunov optimization framework, the time-coupling constraints (1a), (1b) and (11) need to be reformulated in a time-average form. To be specific, summing (1a) up over from 1 to and divide it by , we have
[TABLE]
Since and are bounded, we have
[TABLE]
Similarly, constraint (1b) and (11) can be transformed into
[TABLE]
Replacing the time-coupling constraints (1a), (1b) and (11) by (19)–(21), P1 can be transformed into P1′**:
[TABLE]
As a matter of fact, P1′** is a relaxation of P1 since it is easy to prove that any solution feasible to P1 is also feasible to P1′. P1′ is an optimization with time-average objective and constraints, which fits the Lyapunov optimization framework.
3.2 Lyapunov Optimization Based Method
P1′** is still time-coupled due to constraints (19)–(21). The next step is to transform P1′** into a time-decoupled form by means of Lyapunov optimization. We construct three virtual queues and formulate a relaxation of P1′** which does not have time-coupling constraints by drift-plus-penalty method.
3.2.1 Construct Virtual Queues
The first step is to construct virtual queues for the front end workload dynamics, the back end workload dynamics, and the battery dynamics, denoted by , , and , respectively. The traditional method [16] is to build virtual queues , by modifying the workload queues in (1a)-(1b) slightly:
[TABLE]
This method is straightforward and easy to implement. However, it cannot ensure that the corresponding workload queues are within their physical upper bounds , . To overcome this shortcoming, in this paper novel virtual queues are proposed as follows:
[TABLE]
where , , , , are parameters to be determined later.
According to the definitions above, it is easy to prove that queues , , and are mean rate stable. For example, for the queue , according to (24a), we have
[TABLE]
Hence,
[TABLE]
The last equation is due to (19). Similarly, we have
[TABLE]
which means that all these virtual queues grow slower than linearly over time.
3.2.2 Obtain Lyapunov Function and Drift-Plus-Penalty
Based on the virtual queues, we define the Lyapunov function as follows:
[TABLE]
where , representing the current state of the system, then Lyapunov drift from the time slot to is defined as
[TABLE]
Then, the drift-plus-penalty term is given by
[TABLE]
By minimizing (30) instead of the original objective function (16), the time-average constraints (19)–(21) can be omitted as long as an appropriate is chosen, which will be discussed later. However, now the term in the objective becomes time-coupled. Hence, further transformation is needed.
3.2.3 Minimizing the Upper Bound
The idea is to derive a time-decoupled upper bound of (30) and minimize it instead. According to (24b) – (24c), we have
[TABLE]
where
[TABLE]
Ignoring the constant terms in the objective function (which will not change the optimal solution), the online optimization problem can be formulated as follows:
[TABLE]
Note that constraints bounding workload queues (5) and the energy level of battery (12) have been removed, and the time-average constraints (19)–(21) are no longer needed because all of them are automatically satisfied by minimizing (33) as proved in Proposition 1 later. There is no time-coupling constraint in P2, which can be computed online.
3.3 Performance Guarantee
To ensure that the optimal solution of P2 is feasible to P1, the parameter in (30) and the parameters in (24) must satisfy certain requirements as follows.
Proposition 1**.**
When assumptions A1–A4 hold, if and satisfies the following requirements:
[TABLE]
then the optimal solution obtained by P2 satisfies the constraints (5a), (5b) and (12).
The proof of Proposition 1 can be found in Appendix A. Due to A4, a proper is easy to find. To find the that satisfy (34), let
[TABLE]
Then for any , the requirement (34a) is always satisfied, and (34b) is reduced to
[TABLE]
Note that the right-hand side of (36) will not be a large number, since the capacities of front ends and back ends are usually close and so is the bandwidth. Hence, it is not difficult to find a that satisfies (34c) and (36).
Apart from the satisfaction of time-coupling constraints, another issue we care about is the optimality gap between the offline problem P1 and the online problem P2. Denote the values of in the optimal solutions of P1 and P2 by and respectively, the optimal value of P1 by , and let
[TABLE]
The optimality gap can be bounded in the proposition below.
Proposition 2**.**
The optimality gap is bounded by:
[TABLE]
The proof of Proposition 2 can be found in Appendix B. The discussion above reveals that the parameter is critical to the performance of the proposed algorithm. According to Proposition 2, should be as large as possible to minimize the gap between and . Meanwhile, is bounded above by (34c), otherwise the time-coupling constraints in P1 will be violated.
4 Distributed Implementation
P2 is still a centralized optimization problem, which may be impractical due to privacy concerns, communication and computational burdens. Therefore, a distributed algorithm is needed. In this section, an accelerated ADMM-based algorithm with iteration truncation is developed.
First, to distinguish the local variables of the three subsystems, we use , to denote the transferred workloads optimized by front ends and back ends respectively, and , to denote the exchanged energy optimized by back ends and the power grid respectively. Constraints (13a)-(13b) is then replaced by the following constraints:
[TABLE]
The two new variables should be also bounded:
[TABLE]
Coupling constraints are added to ensure that the optimal strategies provided by different subsystems are equal:
[TABLE]
In addition, in (33) also needs modification. Since the second term is optimized by back ends, in this term is replaced by . Denote the modified by :
[TABLE]
Then the augmented Lagrangian function is given by
[TABLE]
where , are the corresponding dual variables. Then P2 can be equivalently transformed into:
[TABLE]
The traditional ADMM algorithm can be applied to solve P2 in a distributed manner. In particular, the decision variables of front ends, back ends and the power grids, denoted by , and respectively, are
[TABLE]
In the -th iteration, the front end solves
[TABLE]
The back end solves
[TABLE]
The power grid solves
[TABLE]
However, the traditional ADMM may need considerable iterations to converge. The random parameters, e.g. electricity price and workloads, may have already changed before the algorithm is converged, rendering it less practical. A straightforward idea to lighten the computational burden is to break the iteration when it takes too much time to converge. However, this may lead to violation of the coupling constraints (41a) and (41b). To tackle this issue, we propose the following adjustment method.
Suppose the iteration is truncated at the step , we choose as the optimal strategy and let
[TABLE]
Then, constraint (41a) is met but constraint (14) may be violated due to the change of . Denote by the gap between the left-hand side and the right-hand side of (14):
[TABLE]
This gap can be filled by adjusting the amount of energy bought from or sold to the grids. If , we increase by the same amount; otherwise we increase by the same amount:
[TABLE]
The treatment for and is similar. Let
[TABLE]
Then (41a) is met but may go beyond the bounds when updating the workload queue using (1a). To this end, needs to be adjusted. If , let
[TABLE]
Then update using the new .
The overall procedure is given in Algorithm 1.
5 Case Studies
In this section, the performance of the proposed algorithm is presented and compared with benchmark algorithms. Impact of various factors on the performance is also analyzed. Finally, the scalability of the proposed algorithm is examined.
5.1 Simulation Setup
The proposed algorithm is implemented on MATLAB 2022a and the simulations are performed on a desktop PC with an Intel i5-10505 CPU and 8 GB RAM. We first use a simple DC system with 2 front ends and 3 back ends for illustration. The parameters related to the batteries are MWh, MWh, kWh, kWh, , . The upper bounds of workload queues are MWh. Real-world data are employed including the electricity prices of PJM [24], the workload traces of Google cluster [25], and solar radiation data of NREL [26]. The simulation is conducted over a one-week period divided into 2016 time slots, where each time slot is 5 minutes.
To testify the effectiveness of the proposed online algorithm, we examine the performance of the following algorithms or models with the same parameters for comparison:
- B1
Offline algorithm, i.e., solving P1 directly assuming complete future information. 2. B2
Greedy algorithm. We set a threshold of buying prices of electricity. The battery of back end is charged at the maximum power, i.e., , if the buying price at back end is lower than . Then is minimized in each time slot with the same constraints as P1 for that specific time slot. 3. B3
Proposed online algorithm on a model without energy sharing between back ends. This can be done by setting the lower and upper bounds of shared energy to zero. 4. B4
Traditional online algorithm using virtual queues (23a), (23b), and (24c).
5.2 Performance Comparisons
Figure 2 shows the battery energy of the back ends using the proposed algorithm. It can be observed that constraint (12) is always met for all back ends. All traces are below the maximum battery energy (the red dash line), while the minimum battery energy is far below the plot window. The workload queues at front ends and back ends are presented in Figure 2 (right) and Figure 3 (left) respectively. Only part of the traces () is presented for the sake of clarity. Constraints (5a) and (5b) are all satisfied though they are not explicitly considered in P2, which justifies Proposition 1.
Furthermore, we compare the performance of the proposed online algorithm with algorithms B1, B2 and B3 stated in Section 5.1. The workload queues attained by B4 is presented in Figure 3 (right). In the results of B4, the queues at all front ends keep going up to far beyond their physical limitation, which reveals the necessity of the proposed algorithm that can ensure the satisfaction of physical limitation. The accumulated operation costs, i.e., the total costs from the beginning to the present time slot, of the overall DC system are shown in Figure 4. Compared to B2 and B3, the proposed algorithm reduces the total costs by 12% and 2% respectively. This shows the advantages of the proposed algorithm and that energy sharing among different data centers can improve the overall efficiency. Although the offline model B1 has the lowest total costs, it is of little practical use due to the lack of future information. The time-average gap between the proposed algorithm and B1 is 655 USD at the end of the time horizon, less than the right-hand side of (38), which verifies Proposition 2. The details of the costs are listed in Table 1.
5.3 Impact of Parameters
We further test the impact of in the objective function (33) of P2. Let , respectively and record the accumulated overall operation costs in Figure 5. It can be observed that the costs are reduced as increases since it puts more emphasis on in (33). The traces of battery energy provide some insights into this phenomenon: As shown in Figure 5, the charging and discharging of battery become more frequent as goes down. This is because the stability of virtual queues (24b)–(24c), including the one related to battery energy, is prioritized since dominates the objective function, keeping battery energy level in a smaller range. Consequently, the battery cost is increased from USD when to USD when .
The role of P2P energy sharing on the total costs of DCs are also investigated. A drop in the costs can be observed in Figure 6 as the upper bound of P2P trading is increased, revealing the improvement in the efficiency of DC system. The reason is that: The DC short of energy may turn to other DCs instead of buying electricity from the main grids, since , the payment of the DC decreases. Similarly, since , the DC with surplus energy can sell to other DCs instead of selling to the grid to earn a profit. The relationship between the net cost of buying electricity and is presented in Figure 6.
5.4 Convergence & Scalability
In the following, we focus on the effectiveness of the proposed accelerated ADMM algorithm. First, we solve the problem using the traditional ADMM algorithm. To illustrate the computational burden in this test case, the numbers of iterations required to converge are recorded and demonstrated in Figure 7 as a histogram.While most of the numbers fall below 10, we also discover that it may take up to 1000 iterations to complete the iteration in extreme cases, probably too time-consuming for the real-time operation of a DC system. Note that only 21% of time slots have gone through more than 50 iterations to converge in Figure 7. Therefore, we consider using 50 as the threshold to implement the accelerated algorithm with iteration truncation. We choose 6 examples that take excessive iterations to converge in the traditional algorithm and plot and , and in Figure 8. The results show that the discrepancies between and , and are negligible, which means 50 is a proper threshold. The maximum relative errors and , defined as follows, are less than 0.2%.
[TABLE]
The accumulated overall operation costs obtained by the proposed accelerated ADMM algorithm and its centralized counterpart are compared in Figure 9 (left). The two curves coincide with each other, showing the reliability of the proposed distributed algorithm. The results of the traditional ADMM algorithm and the proposed accelerated algorithm with truncation are compared in Figure 9 (right). The two traces nearly overlap with each other, which means the truncation does not affect the optimality of the results much.
Finally, we evaluate the scalability of the proposed algorithm using larger systems with more front ends and back ends. We fix the number of back ends to 2 and change from 3 to 20, and fix the number of front ends to 3 and change from 2 to 20, respectively. The computational time per agent is shown in Figure 10. The time needed is acceptable for real-time DC operation. Moreover, the proposed accelerated ADMM saves more than half (61%) of the computational time compared to the traditional ADMM.
6 Conclusion
In this paper, a distributed online algorithm is proposed for combined computing workload and energy coordination of data centers. The online algorithm is based on Lyapunov optimization with a novel design of virtual queues. An accelerated ADMM algorithm is developed for fast distributed implementation. Through simulation, we have the following findings:
The proposed online algorithm is effective and can achieve a nearly offline optimal outcome. 2. 2.
The proposed accelerated ADMM algorithm reduces the computational time compared to the traditional ADMM algorithm. 3. 3.
The proposed algorithm reduces the total costs compared to the greedy algorithm and the one without energy sharing.
Incorporating the heterogeneous features of computing workloads and balancing multiple objectives (total costs, carbon emissions, etc.) will be our future research direction.
Appendix A Proof of Proposition 1
Before we prove Proposition 1, we give the following lemma.
Lemma 1**.**
Suppose are differentiable. Suppose is the optimal solution of the following optimization:
[TABLE]
Then, if and , we have .
Proof of Lemma 1. Problem (A.1) is equivalent to
[TABLE]
Given , the inner minimization problem is a linear program whose optimal solution lies at one of the vertex of the feasible region. Hence, the optimal solution is either or . If it is the first case, problem (A.1) can be further turned into
[TABLE]
Therefore, if , we have . Similarly, if it is the second case, problem (A.1) can be further turned into
[TABLE]
Therefore, if , we have . This completes the proof of Lemma 1.
First, we take the partial derivatives of the objective function in (33) with respect to , , , , , respectively. For and that are decoupled with other variables, we directly calculate their derivatives as follows.
[TABLE]
For , , that are coupled through constraint (14), eliminating by (14) and taking the derivatives yields
[TABLE]
Similarly, we can eliminate in the same way, yielding
[TABLE]
After getting the derivatives, we prove the satisfaction of constraints (5a), (5b), and (12) as follows.
A.1 Proof of Constraint (5a)
- •
When , we have
[TABLE]
The second inequality is derived from the condition that
[TABLE]
Thus and
[TABLE]
which is because of the range of in this case and the assumption A1. Therefore, is still within .
- •
When ,
[TABLE]
- •
When ,
[TABLE]
The second inequality is derived from the condition that
[TABLE]
Thus and
[TABLE]
which is because of the range of in this case and the assumption A1. Therefore, is still within .
A.2 Proof of Constraint (5b)
- •
When , we have
[TABLE]
These can be derived from the condition that
[TABLE]
Thus and
[TABLE]
which is because of the range of in this case, the assumption A2, and Lemma 1. Therefore, is still within .
- •
When ,
[TABLE]
- •
When
[TABLE]
The second inequality is derived from
[TABLE]
Thus and
[TABLE]
which is because of the range of in this case and the assumption A2.
Therefore, is still within .
A.3 Proof of Constraint (12)
- •
When , we have
[TABLE]
These can be derived from assumption A4 and the conditions that
[TABLE]
Thus and
[TABLE]
which is because of the range of in this case, the assumption A3, and Lemma 1. Therefore, is still within .
- •
When ,
[TABLE]
- •
When , we have
[TABLE]
These can be derived from assumption A4 and the conditions that
[TABLE]
Thus and
[TABLE]
which is because of the range of in this case, the assumption A3, and Lemma 1.
Therefore, is still within .
This completes the proof.
Appendix B Proof of Proposition 2
According to (3.2.3), we have
[TABLE]
According to the strong law of large numbers,
[TABLE]
The second equality of (B.2) – (B.4) can be derived from (24b) – (24c), hence the right-hand side of (B) can be reduced. By summing the new inequality over , we have
[TABLE]
We divide both sides of (B) by and take the limit with going to infinity, yielding
[TABLE]
This completes the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Miller, The sustainability imperative: Green data centers and our cloudy future, Data Center Frontier, Tech. Rep (2020).
- 2[2] R. Miller, Regional data center clusters power amazon’s cloud[Online]. Available: https://www.datacenterfrontier.com/featured/article/11431479/regional-data-center-clusters-power-amazon 8217 s-cloud (2015).
- 3[3] H. Wang, Q. Wang, Y. Tang, Y. Ye, Spatial load migration in a power system: Concept, potential and prospects, International Journal of Electrical Power & Energy Systems 140 (2022) 107926. doi:https://doi.org/10.1016/j.ijepes.2021.107926 . · doi ↗
- 4[4] A. H. Alobaidi, M. Khodayar, A. Vafamehr, H. Gangammanavar, M. E. Khodayar, Stochastic expansion planning of battery energy storage for the interconnected distribution and data networks, International Journal of Electrical Power & Energy Systems 133 (2021) 107231. doi:https://doi.org/10.1016/j.ijepes.2021.107231 . · doi ↗
- 5[5] Z. Ding, L. Xie, Y. Lu, P. Wang, S. Xia, Emission-aware stochastic resource planning scheme for data center microgrid considering batch workload scheduling and risk management, IEEE Transactions on Industry Applications 54 (6) (2018) 5599–5608. doi:10.1109/TIA.2018.2851516 . · doi ↗
- 6[6] P. Wang, Y. Cao, Z. Ding, H. Tang, X. Wang, M. Cheng, Stochastic programming for cost optimization in geographically distributed internet data centers, CSEE Journal of Power and Energy Systems 8 (4) (2022) 1215–1232. doi:10.17775/CSEEJPES.2020.02930 . · doi ↗
- 7[7] T. Niu, B. Hu, K. Xie, C. Pan, H. Jin, C. Li, Spacial coordination between data centers and power system considering uncertainties of both source and load sides, International Journal of Electrical Power & Energy Systems 124 (2021) 106358. doi:https://doi.org/10.1016/j.ijepes.2020.106358 . · doi ↗
- 8[8] T. Chen, Y. Zhang, X. Wang, G. B. Giannakis, Robust workload and energy management for sustainable data centers, IEEE Journal on Selected Areas in Communications 34 (3) (2016) 651–664. doi:10.1109/JSAC.2016.2525618 . · doi ↗
