TL;DR
This paper introduces an asynchronous finite-time distributed coordination algorithm for CPU scheduling in data centers, enabling efficient, robust, and near-optimal task allocation despite communication delays and node variability.
Contribution
It presents a novel asynchronous finite-time convergence scheme for distributed scheduling, improving robustness and efficiency in data center resource management.
Findings
Achieves finite-time convergence to near-optimal schedules
Robust against communication delays and straggler nodes
Demonstrates state-of-the-art performance in simulations
Abstract
We propose an asynchronous iterative scheme that allows a set of interconnected nodes to distributively reach an agreement within a pre-specified bound in a finite number of steps. While this scheme could be adopted in a wide variety of applications, we discuss it within the context of task scheduling for data centers. In this context, the algorithm is guaranteed to approximately converge to the optimal scheduling plan, given the available resources, in a finite number of steps. Furthermore, by being asynchronous, the proposed scheme is able to take into account the uncertainty that can be introduced from straggler nodes or communication issues in the form of latency variability while still converging to the target objective. In addition, by using extensive empirical evaluation through simulations we show that the proposed method exhibits state-of-the-art performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
