Novel Gradient Sparsification Algorithm via Bayesian Inference

Ali Bereyhi; Ben Liang; Gary Boudreau; Ali Afana

arXiv:2409.14893·cs.LG·September 24, 2024

Novel Gradient Sparsification Algorithm via Bayesian Inference

Ali Bereyhi, Ben Liang, Gary Boudreau, Ali Afana

PDF

TL;DR

This paper introduces RegTop-$k$, a Bayesian inference-based gradient sparsification method that improves convergence and accuracy in distributed training by controlling error accumulation.

Contribution

It proposes a novel Bayesian inference approach to gradient sparsification, optimizing the sparsification mask to enhance convergence and accuracy.

Findings

01

RegTop-$k$ achieves 8% higher accuracy at 0.1% sparsification compared to standard Top-$k$.

02

The algorithm effectively controls error accumulation in gradient sparsification.

03

Numerical experiments validate the improved performance on ResNet-18 with CIFAR-10.

Abstract

Error accumulation is an essential component of the Top- $k$ sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called regularized Top- $k$ (RegTop- $k$ ) that controls the learning rate scaling of error accumulation. The algorithm is developed by looking at the gradient sparsification as an inference problem and determining a Bayesian optimal sparsification mask via maximum-a-posteriori estimation. It utilizes past aggregated gradients to evaluate posterior statistics, based on which it prioritizes the local gradient entries. Numerical experiments with ResNet-18 on CIFAR-10 show that at $0.1%$ sparsification, RegTop- $k$ achieves about $8%$ higher accuracy than standard Top- $k$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGradient Sparsification