Empowering Distributed Training with Sparsity-driven Data Synchronization
Zhuang Wang, Zhaozhuo Xu, Jingyi Xi, Yuke Wang, Anshumali Shrivastava,, T. S. Eugene Ng

TL;DR
This paper introduces Zen, a new gradient synchronization system that leverages tensor sparsity to significantly improve communication efficiency and training speed in distributed deep learning.
Contribution
The paper analyzes sparse tensor characteristics, explores optimal communication schemes, and develops Zen, a holistic system that accelerates distributed training by exploiting sparsity.
Findings
Zen achieves up to 5.09x reduction in communication time.
Zen improves training throughput by up to 2.48x.
The system effectively leverages tensor sparsity for distributed training.
Abstract
Distributed training is the de facto standard to scale up the training of deep learning models with multiple GPUs. Its performance bottleneck lies in communications for gradient synchronization. Although high tensor sparsity is widely observed, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to bridge this gap. We first analyze the characteristics of sparse tensors in popular models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal ones. These findings give a new understanding and inspire us to develop a holistic gradient synchronization system called Zen for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to speedup in training throughput compared to the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
