Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle
Michael Kaufmann, Kornilios Kourtis, Celestine Mendler-D\"unner,, Adrian Sch\"upbach, Thomas Parnell

TL;DR
Chicle is a novel elastic distributed training framework for machine learning that avoids micro-tasks, enabling efficient load balancing and elasticity without compromising convergence or performance.
Contribution
This paper introduces Chicle, a new framework that supports elasticity and load balancing in distributed machine learning without relying on micro-tasks, improving efficiency and scalability.
Findings
Chicle achieves performance comparable to state-of-the-art rigid frameworks.
It effectively enables elastic execution and dynamic load balancing.
Chicle supports training deep neural networks and generalized linear models.
Abstract
Distributed machine learning training is one of the most common and important workloads running on data centers today, but it is rarely executed alone. Instead, to reduce costs, computing resources are consolidated and shared by different applications. In this scenario, elasticity and proper load balancing are vital to maximize efficiency, fairness, and utilization. Currently, most distributed training frameworks do not support the aforementioned properties. A few exceptions that do support elasticity, imitate generic distributed frameworks and use micro-tasks. In this paper we illustrate that micro-tasks are problematic for machine learning applications, because they require a high degree of parallelism which hinders the convergence of distributed training at a pure algorithmic level (i.e., ignoring overheads and scalability limitations). To address this, we propose Chicle, a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
