Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
Mohammadreza Doostmohammadian, Zulfiya R. Gabidullina, Hamid R. Rabiee

TL;DR
This paper presents a novel co-optimization algorithm for distributed machine learning over networks of computing centers, enhancing CPU scheduling efficiency and resource allocation with proven convergence and significant cost improvements.
Contribution
It introduces a new distributed co-optimization framework for CPU scheduling and data processing in networked computing centers, with convergence guarantees and quantization handling.
Findings
Over 50% reduction in cost optimality gap compared to existing solutions
Convergence proven using Lyapunov stability and eigen-spectrum analysis
Algorithm supports time-varying networks and log-quantized data exchange
Abstract
In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
