Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Jiaxiang Wu, Weidong Huang, Junzhou Huang, Tong Zhang

TL;DR
This paper introduces an error compensated quantized SGD algorithm that reduces communication costs in large-scale distributed learning while maintaining convergence speed and accuracy.
Contribution
It proposes a novel gradient quantization method with error compensation, improving training efficiency and convergence in distributed optimization.
Findings
Gradients can be compressed by up to 100x without loss of performance.
Theoretical analysis confirms convergence benefits of the method.
Experiments show significant communication reduction with maintained accuracy.
Abstract
Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
