DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression
Hanlin Tang, Xiangru Lian, Chen Yu, Tong Zhang, Ji Liu

TL;DR
DoubleSqueeze introduces a novel two-pass error-compensated stochastic gradient method that enhances convergence and scalability in distributed machine learning by accommodating arbitrary compression techniques and supporting linear speedup.
Contribution
It provides a detailed analysis of a two-pass communication model with error compensation, improving convergence and scalability over existing methods.
Findings
Compatible with arbitrary compression techniques.
Achieves better convergence rates than previous methods.
Supports linear speedup with the number of workers.
Abstract
A standard approach in large scale machine learning is distributed stochastic gradient training, which requires the computation of aggregated stochastic gradients over multiple nodes on a network. Communication is a major bottleneck in such applications, and in recent years, compressed stochastic gradient methods such as QSGD (quantized SGD) and sparse SGD have been proposed to reduce communication. It was also shown that error compensation can be combined with compression to achieve better convergence in a scheme that each node compresses its local stochastic gradient and broadcast the result to all other nodes over the network in a single pass. However, such a single pass broadcast approach is not realistic in many practical implementations. For example, under the popular parameter server model for distributed learning, the worker nodes need to send the compressed local gradients to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsAffine Coupling · Normalizing Flows · Stochastic Gradient Descent
