ATOMO: Communication-efficient Learning via Atomic Sparsification

Hongyi Wang; Scott Sievert; Zachary Charles; Shengchao Liu; Stephen; Wright; Dimitris Papailiopoulos

arXiv:1806.04090·stat.ML·November 12, 2018·129 cites

ATOMO: Communication-efficient Learning via Atomic Sparsification

Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen, Wright, Dimitris Papailiopoulos

PDF

Open Access 1 Repo

TL;DR

ATOMO introduces a versatile framework for gradient sparsification in distributed learning, unifying existing methods and enabling faster training by atomic decomposition-based gradient compression.

Contribution

The paper presents ATOMO, a general atomic sparsification framework that unifies various gradient compression techniques and improves training speed.

Findings

01

Recent methods like QSGD and TernGrad are special cases of ATOMO.

02

Sparsifying the SVD of gradients accelerates distributed training.

03

Atomic sparsification minimizes variance in gradient compression.

Abstract

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hwang595/ATOMO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning