ByteComp: Revisiting Gradient Compression in Distributed Training

Zhuang Wang; Haibin Lin; Yibo Zhu; T. S. Eugene Ng

arXiv:2205.14465·cs.LG·June 8, 2022·1 cites

ByteComp: Revisiting Gradient Compression in Distributed Training

Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

PDF

Open Access

TL;DR

ByteComp introduces a comprehensive framework for optimizing gradient compression strategies in distributed deep learning, significantly boosting training throughput by efficiently modeling tensor interactions and rapidly selecting near-optimal compression methods.

Contribution

It develops a decision tree abstraction and empirical models to express all compression strategies and interactions, enabling fast and effective strategy selection in DDL.

Findings

01

Up to 77% training throughput improvement

02

Strategy selection takes milliseconds

03

Selected strategies are within a few percent of optimal

Abstract

Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. To fully unleash the benefits of GC, two questions must be addressed: 1) How to express all compression strategies and the corresponding interactions among tensors of any DDL training job? 2) How to quickly select a near-optimal compression strategy? In this paper, we propose ByteComp to answer these questions. It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors. It then designs a compression decision algorithm that analyzes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Tensor decomposition and applications