99% of Distributed Optimization is a Waste of Time: The Issue and How to   Fix it

Konstantin Mishchenko; Filip Hanzely; Peter Richt\'arik

arXiv:1901.09437·cs.LG·June 5, 2019·5 cites

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

Konstantin Mishchenko, Filip Hanzely, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper identifies inefficiencies in distributed optimization communication and proposes a sparsification method that significantly reduces data transfer without affecting convergence, improving scalability.

Contribution

The authors introduce a novel update-sparsification technique for distributed optimization that maintains theoretical convergence rates while drastically reducing communication overhead.

Findings

01

Communication cost is reduced by 99% with minimal impact on convergence.

02

The proposed method matches theoretical predictions in experiments.

03

Significant speedups observed in synthetic and real datasets.

Abstract

Many popular distributed optimization methods for training machine learning models fit the following template: a local gradient estimate is computed independently by each worker, then communicated to a master, which subsequently performs averaging. The average is broadcast back to the workers, which use it to perform a gradient-type step to update the local version of the model. It is also well known that many such methods, including SGD, SAGA, and accelerated SGD for over-parameterized models, do not scale well with the number of parallel workers. In this paper we observe that the above template is fundamentally inefficient in that too much data is unnecessarily communicated by the workers, which slows down the overall system. We propose a fix based on a new update-sparsification method we develop in this work, which we suggest be used on top of existing methods. Namely, we develop a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsSAGA · Stochastic Gradient Descent