An All-Reduce Compatible Top-K Compressor for Communication-Efficient Distributed Learning

Chuyan Chen; Chenyang Ma; Zhangxin Li; Yutong He; Yanjie Dong; Kun Yuan

arXiv:2510.26709·cs.LG·November 5, 2025

An All-Reduce Compatible Top-K Compressor for Communication-Efficient Distributed Learning

Chuyan Chen, Chenyang Ma, Zhangxin Li, Yutong He, Yanjie Dong, Kun Yuan

PDF

TL;DR

The paper introduces ARC-Top-K, a communication-efficient gradient compressor for distributed learning that combines the benefits of Top-K sparsification with All-Reduce compatibility, improving speed and accuracy.

Contribution

ARC-Top-K is a novel gradient compressor that aligns sparsity patterns across nodes using a lightweight sketch, enabling index-free All-Reduce and maintaining contraction properties.

Findings

01

Achieves linear speedup with momentum error feedback.

02

Matches Top-K accuracy while reducing training time by up to 60.7%.

03

Provably contractive and scalable in distributed settings.

Abstract

Communication remains a central bottleneck in large-scale distributed machine learning, and gradient sparsification has emerged as a promising strategy to alleviate this challenge. However, existing gradient compressors face notable limitations: Rand- $K$ discards structural information and performs poorly in practice, while Top- $K$ preserves informative entries but loses the contraction property and requires costly All-Gather operations. In this paper, we propose ARC-Top- $K$ , an {All-Reduce}-Compatible Top- $K$ compressor that aligns sparsity patterns across nodes using a lightweight sketch of the gradient, enabling index-free All-Reduce while preserving globally significant information. ARC-Top- $K$ is provably contractive and, when combined with momentum error feedback (EF21M), achieves linear speedup and sharper convergence rates than the original EF21M under standard assumptions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.