Flattened one-bit stochastic gradient descent: compressed distributed   optimization with controlled variance

Alexander Stollenwerk; Laurent Jacques

arXiv:2405.11095·cs.LG·May 21, 2024

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance

Alexander Stollenwerk, Laurent Jacques

PDF

Open Access

TL;DR

This paper introduces FO-SGD, a novel distributed SGD algorithm that uses one-bit quantization and Walsh-Hadamard transforms to efficiently compress gradients, ensuring convergence with controlled variance.

Contribution

The paper presents a new gradient compression method combining dithering and Walsh-Hadamard transforms, enabling bias-controlled, efficient distributed optimization with convergence guarantees.

Findings

01

Achieves SGD-like convergence guarantees under mild conditions.

02

Prevents variance explosion and performance deterioration in sparse gradients.

03

Supports full communication compression in distributed settings.

Abstract

We propose a novel algorithm for distributed stochastic gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit stochastic gradient descent (FO-SGD), relies on two simple algorithmic ideas: (i) a one-bit quantization procedure leveraging the technique of dithering, and (ii) a randomized fast Walsh-Hadamard transform to flatten the stochastic gradient before quantization. As a result, the approximation of the true gradient in this scheme is biased, but it prevents commonly encountered algorithmic problems, such as exploding variance in the one-bit compression regime, deterioration of performance in the case of sparse gradients, and restrictive assumptions on the distribution of the stochastic gradients. In fact, we show SGD-like convergence guarantees under mild conditions. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques