Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression
Wenchen Han, Shay Vargaftik, Michael Mitzenmacher, Brad Karp, Ran Ben, Basat

TL;DR
This paper critically examines gradient compression techniques in distributed machine learning, highlighting issues in previous methods and proposing design improvements to enhance end-to-end training utility.
Contribution
It identifies key shortcomings in existing gradient compression systems and offers strategic design modifications to improve their practical efficiency and evaluation standards.
Findings
Addressed computational overheads in gradient compression
Improved compatibility with all-reduce operations
Enhanced evaluation methods using stronger baselines
Abstract
Gradient aggregation has long been identified as a major bottleneck in today's large-scale distributed machine learning training systems. One promising solution to mitigate such bottlenecks is gradient compression, directly reducing communicated gradient data volume. However, in practice, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. In this work, we identify common issues in previous gradient compression systems and evaluation methodologies. These include excessive computational overheads; incompatibility with all-reduce; and insufficient evaluation methods, such as not using an end-to-end metric or using a 32-bit baseline instead of the stronger 16-bit baseline. We revisit common compression approaches (sparsification, quantization, and low-rank decomposition) and demonstrate how considering the above issues can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
