Addressing Algorithmic Bottlenecks in Elastic Machine Learning with   Chicle

Michael Kaufmann; Kornilios Kourtis; Celestine Mendler-D\"unner,; Adrian Sch\"upbach; Thomas Parnell

arXiv:1909.04885·cs.LG·September 12, 2019

Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle

Michael Kaufmann, Kornilios Kourtis, Celestine Mendler-D\"unner,, Adrian Sch\"upbach, Thomas Parnell

PDF

Open Access

TL;DR

Chicle is a novel elastic distributed training framework for machine learning that avoids micro-tasks, enabling efficient load balancing and elasticity without compromising convergence or performance.

Contribution

This paper introduces Chicle, a new framework that supports elasticity and load balancing in distributed machine learning without relying on micro-tasks, improving efficiency and scalability.

Findings

01

Chicle achieves performance comparable to state-of-the-art rigid frameworks.

02

It effectively enables elastic execution and dynamic load balancing.

03

Chicle supports training deep neural networks and generalized linear models.

Abstract

Distributed machine learning training is one of the most common and important workloads running on data centers today, but it is rarely executed alone. Instead, to reduce costs, computing resources are consolidated and shared by different applications. In this scenario, elasticity and proper load balancing are vital to maximize efficiency, fairness, and utilization. Currently, most distributed training frameworks do not support the aforementioned properties. A few exceptions that do support elasticity, imitate generic distributed frameworks and use micro-tasks. In this paper we illustrate that micro-tasks are problematic for machine learning applications, because they require a high degree of parallelism which hinders the convergence of distributed training at a pure algorithmic level (i.e., ignoring overheads and scalability limitations). To address this, we propose Chicle, a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques