Regularized Top-$k$: A Bayesian Framework for Gradient Sparsification

Ali Bereyhi; Ben Liang; Gary Boudreau; Ali Afana

arXiv:2501.05633·cs.LG·February 17, 2026

Regularized Top-$k$: A Bayesian Framework for Gradient Sparsification

Ali Bereyhi, Ben Liang, Gary Boudreau, Ali Afana

PDF

Open Access

TL;DR

This paper introduces a Bayesian regularization approach to gradient sparsification called RegTop-k, which improves convergence and performance in distributed training by controlling error accumulation.

Contribution

It develops a novel Bayesian-based sparsification scheme that regularizes Top-k, enhancing convergence and accuracy at high compression ratios in distributed learning.

Findings

01

RegTop-k converges to the global optimum faster than Top-k.

02

RegTop-k outperforms Top-k in distributed training of ResNet-18 on CIFAR-10.

03

Higher compression ratios benefit from the proposed regularization.

Abstract

Error accumulation is effective for gradient sparsification in distributed settings: initially-unselected gradient entries are eventually selected as their accumulated error exceeds a certain level. The accumulation essentially behaves as a scaling of the learning rate for the selected entries. Although this property prevents the slow-down of lateral movements in distributed gradient descent, it can deteriorate convergence in some settings. This work proposes a novel sparsification scheme that controls the learning rate scaling of error accumulation. The development of this scheme follows two major steps: first, gradient sparsification is formulated as an inverse probability (inference) problem, and the Bayesian optimal sparsification mask is derived as a maximum-a-posteriori estimator. Using the prior distribution inherited from Top-k, we derive a new sparsification algorithm which can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis

MethodsGradient Sparsification