Error Compensated Quantized SGD and its Applications to Large-scale   Distributed Optimization

Jiaxiang Wu; Weidong Huang; Junzhou Huang; Tong Zhang

arXiv:1806.08054·cs.CV·June 22, 2018·38 cites

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Jiaxiang Wu, Weidong Huang, Junzhou Huang, Tong Zhang

PDF

Open Access

TL;DR

This paper introduces an error compensated quantized SGD algorithm that reduces communication costs in large-scale distributed learning while maintaining convergence speed and accuracy.

Contribution

It proposes a novel gradient quantization method with error compensation, improving training efficiency and convergence in distributed optimization.

Findings

01

Gradients can be compressed by up to 100x without loss of performance.

02

Theoretical analysis confirms convergence benefits of the method.

03

Experiments show significant communication reduction with maintained accuracy.

Abstract

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings