L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning
Mohammadreza Alimohammadi, Ilia Markov, Elias Frantar, Dan Alistarh

TL;DR
L-GreCo is a dynamic layerwise compression framework for distributed deep learning that improves training speed and compression efficiency without losing accuracy by adaptively tuning compression parameters per layer.
Contribution
It introduces a novel adaptive algorithm that optimally adjusts compression levels across layers during training, enhancing existing methods' effectiveness.
Findings
Achieves up to 2.5x training speedup.
Provides up to 5x compression improvement.
Maintains full model accuracy.
Abstract
Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model's layers dynamically during training, improving the overall compression, while leading to substantial speedups, without sacrificing accuracy. Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Algorithms · Medical Image Segmentation Techniques
