The Power of d Choices in Scheduling for Data Centers with Heterogeneous Servers
Amir Moaddeli, Iman Nabati Ahmadi, Negin Abhar

TL;DR
This paper explores simplified, low-complexity load balancing algorithms using the power of $d$ choices in heterogeneous data center servers, demonstrating improved performance and efficiency over traditional methods.
Contribution
It introduces the Balanced-Pandas-Pod algorithm, combining $d$ choices with Balanced-Pandas, showing better performance and lower complexity than existing algorithms in large-scale data centers.
Findings
Balanced-Pandas-Pod outperforms simple Balanced-Pandas at low and medium loads.
Balanced-Pandas-Pod performs nearly as well as Balanced-Pandas at high loads.
Complexity of proposed algorithms is reduced to O(1), enabling faster scheduling and energy savings.
Abstract
MapReduce framework is the de facto in big data and its applications where a big data-set is split into small data chunks that are replicated on different servers among thousands of servers. The heterogeneous server structure of the system makes the scheduling much harder than scheduling for systems with homogeneous servers. Throughput optimality of the system on one hand and delay optimality on the other hand creates a dilemma for assigning tasks to servers. The JSQ-MaxWeight and Balanced-Pandas algorithms are the states of the arts algorithms with theoretical guarantees on throughput and delay optimality for systems with two and three levels of data locality. However, the scheduling complexity of these two algorithms are way too much. Hence, we use the power of choices algorithm combined with the Balanced-Pandas algorithm and the JSQ-MaxWeight algorithm, and compare the complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Distributed systems and fault tolerance
