Temporal Predictive Coding for Gradient Compression in Distributed Learning
Adrian Edin, Zheng Chen, Michel Kieffer, and Mikael Johansson

TL;DR
This paper introduces a prediction-based gradient compression method for distributed learning that leverages temporal correlation in gradients to reduce communication costs while maintaining convergence.
Contribution
It proposes a novel linear predictor-based compression technique with event-triggered communication, optimizing gradient transmission in distributed learning.
Findings
Achieves significant reduction in communication without sacrificing convergence.
Outperforms existing gradient compression methods in experiments.
Maintains model accuracy with less data transmitted.
Abstract
This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server by exploiting temporal correlation in the local gradients. We use a linear predictor that \textit{combines past gradients to form a prediction of the current gradient}, with coefficients that are optimized by solving a least-square problem. In each iteration, every agent transmits the predictor coefficients to the server such that the predicted local gradient can be computed. The difference between the true local gradient and the predicted one, termed the \textit{prediction residual, is only transmitted when its norm is above some threshold.} When this additional communication step is omitted, the server uses the prediction as the estimated gradient.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Neural Networks and Applications
