Regularized Top-$k$: A Bayesian Framework for Gradient Sparsification
Ali Bereyhi, Ben Liang, Gary Boudreau, Ali Afana

TL;DR
This paper introduces a Bayesian regularization approach to gradient sparsification called RegTop-k, which improves convergence and performance in distributed training by controlling error accumulation.
Contribution
It develops a novel Bayesian-based sparsification scheme that regularizes Top-k, enhancing convergence and accuracy at high compression ratios in distributed learning.
Findings
RegTop-k converges to the global optimum faster than Top-k.
RegTop-k outperforms Top-k in distributed training of ResNet-18 on CIFAR-10.
Higher compression ratios benefit from the proposed regularization.
Abstract
Error accumulation is effective for gradient sparsification in distributed settings: initially-unselected gradient entries are eventually selected as their accumulated error exceeds a certain level. The accumulation essentially behaves as a scaling of the learning rate for the selected entries. Although this property prevents the slow-down of lateral movements in distributed gradient descent, it can deteriorate convergence in some settings. This work proposes a novel sparsification scheme that controls the learning rate scaling of error accumulation. The development of this scheme follows two major steps: first, gradient sparsification is formulated as an inverse probability (inference) problem, and the Bayesian optimal sparsification mask is derived as a maximum-a-posteriori estimator. Using the prior distribution inherited from Top-k, we derive a new sparsification algorithm which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
MethodsGradient Sparsification
