MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Daegun Yoon, Sangyoon Oh

TL;DR
MiCRO is a novel gradient sparsification method that significantly reduces communication costs in distributed DNN training by partitioning gradients and accurately estimating thresholds, achieving near-zero cost sparsification.
Contribution
MiCRO introduces a scalable, efficient gradient sparsification technique that minimizes computational overhead and communication traffic, improving upon existing methods.
Findings
Outperforms state-of-the-art sparsifiers in convergence rate
Achieves near-zero cost gradient sparsification
Effectively maintains communication traffic as per user requirements
Abstract
Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Advanced Neural Network Applications · COVID-19 diagnosis using AI
MethodsGradient Sparsification
