ByteComp: Revisiting Gradient Compression in Distributed Training
Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

TL;DR
ByteComp introduces a comprehensive framework for optimizing gradient compression strategies in distributed deep learning, significantly boosting training throughput by efficiently modeling tensor interactions and rapidly selecting near-optimal compression methods.
Contribution
It develops a decision tree abstraction and empirical models to express all compression strategies and interactions, enabling fast and effective strategy selection in DDL.
Findings
Up to 77% training throughput improvement
Strategy selection takes milliseconds
Selected strategies are within a few percent of optimal
Abstract
Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. To fully unleash the benefits of GC, two questions must be addressed: 1) How to express all compression strategies and the corresponding interactions among tensors of any DDL training job? 2) How to quickly select a near-optimal compression strategy? In this paper, we propose ByteComp to answer these questions. It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors. It then designs a compression decision algorithm that analyzes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Tensor decomposition and applications
