MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and   Accelerating Distributed DNN Training

Daegun Yoon; Sangyoon Oh

arXiv:2310.00967·cs.LG·February 21, 2024

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Daegun Yoon, Sangyoon Oh

PDF

Open Access 1 Repo

TL;DR

MiCRO is a novel gradient sparsification method that significantly reduces communication costs in distributed DNN training by partitioning gradients and accurately estimating thresholds, achieving near-zero cost sparsification.

Contribution

MiCRO introduces a scalable, efficient gradient sparsification technique that minimizes computational overhead and communication traffic, improving upon existing methods.

Findings

01

Outperforms state-of-the-art sparsifiers in convergence rate

02

Achieves near-zero cost gradient sparsification

03

Effectively maintains communication traffic as per user requirements

Abstract

Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kljp/micro
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Advanced Neural Network Applications · COVID-19 diagnosis using AI

MethodsGradient Sparsification