CGX: Adaptive System Support for Communication-Efficient Deep Learning
Ilia Markov, Hamidreza Ramezanikebrya, Dan Alistarh

TL;DR
CGX introduces a system that enables efficient compressed communication for deep learning training, reducing hardware costs and improving scalability without requiring major code changes.
Contribution
It presents a novel framework combining system-level communication stack redesign and adaptive compression techniques for scalable, cost-effective deep learning training.
Findings
Up to 3X speedup on multi-GPU nodes with commodity hardware
Order-of-magnitude improvement in multi-node training
Negligible impact on model accuracy
Abstract
The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth overprovisioning. Overprovisioning comes at a cost: there is an order of magnitude price difference between "cloud-grade" servers with such support, relative to their popular "consumer-grade" counterparts, although single server-grade and consumer-grade GPUs can have similar computational envelopes. In this paper, we show that the costly hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for compressed communication in ML applications, for both multi-GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
