Improved Convergence in Parameter-Agnostic Error Feedback through Momentum

Abdurakhmon Sadiev; Yury Demidovich; Igor Sokolov; Grigory Malinovsky; Sarit Khirirat; Peter Richt\'arik

arXiv:2511.14501·math.OC·November 19, 2025

Improved Convergence in Parameter-Agnostic Error Feedback through Momentum

Abdurakhmon Sadiev, Yury Demidovich, Igor Sokolov, Grigory Malinovsky, Sarit Khirirat, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces parameter-agnostic normalized error feedback algorithms with momentum for distributed training, achieving near-optimal convergence rates without problem-specific tuning, validated by theoretical analysis and experiments.

Contribution

It proposes new normalized EF algorithms with momentum that do not require prior problem parameter knowledge, improving practical applicability in distributed neural network training.

Findings

01

Normalized EF21 achieves near O(1/T^{1/4}) convergence with heavy-ball momentum.

02

Algorithms attain convergence rates close to tuned methods without parameter tuning.

03

Empirical results validate theoretical convergence bounds.

Abstract

Communication compression is essential for scalable distributed training of modern machine learning models, but it often degrades convergence due to the noise it introduces. Error Feedback (EF) mechanisms are widely adopted to mitigate this issue of distributed compression algorithms. Despite their popularity and training efficiency, existing distributed EF algorithms often require prior knowledge of problem parameters (e.g., smoothness constants) to fine-tune stepsizes. This limits their practical applicability especially in large-scale neural network training. In this paper, we study normalized error feedback algorithms that combine EF with normalized updates, various momentum variants, and parameter-agnostic, time-varying stepsizes, thus eliminating the need for problem-dependent tuning. We analyze the convergence of these algorithms for minimizing smooth functions, and establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data