TL;DR
Auto-Precision Scaling (APS) enhances distributed deep learning by enabling accurate low-precision gradient communication, significantly reducing bandwidth without sacrificing model accuracy, and is implemented in an open-source system integrated with PyTorch.
Contribution
The paper introduces APS, a novel algorithm that improves low-precision gradient accuracy in distributed training, along with a hybrid-precision technique and an open-source simulation system.
Findings
APS achieves <0.05% accuracy loss with 8-bit gradients.
APS provides significant speedup over existing methods.
The CPD system allows flexible simulation of low-precision training.
Abstract
It has been reported that the communication cost for synchronizing gradients can be a bottleneck, which limits the scalability of distributed deep learning. Using low-precision gradients is a promising technique for reducing the bandwidth requirement. In this work, we propose Auto Precision Scaling (APS), an algorithm that can improve the accuracy when we communicate gradients by low-precision floating-point values. APS can improve the accuracy for all precisions with a trivial communication cost. Our experimental results show that for many applications, APS can train state-of-the-art models by 8-bit gradients with no or only a tiny accuracy loss (<0.05%). Furthermore, we can avoid any accuracy loss by designing a hybrid-precision technique. Finally, we propose a performance model to evaluate the proposed method. Our experimental results show that APS can get a significant speedup over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
