Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum

Sarit Khirirat; Abdurakhmon Sadiev; Artem Riabinin; Eduard Gorbunov,; Peter Richt\'arik

arXiv:2410.16871·cs.LG·October 23, 2024

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum

Sarit Khirirat, Abdurakhmon Sadiev, Artem Riabinin, Eduard Gorbunov,, Peter Richt\'arik

PDF

Open Access 1 Video

TL;DR

This paper proves convergence of normalized error feedback algorithms for distributed deep learning under generalized smoothness, enabling larger stepsizes and improved performance without restrictive assumptions.

Contribution

It introduces normalized error feedback algorithms with convergence guarantees under generalized smoothness in distributed settings, removing previous strong assumptions.

Findings

01

Achieves $O(1/ oot 2 K)$ convergence rate for nonconvex problems.

02

Enables stepsize tuning independent of problem parameters.

03

Outperforms non-normalized algorithms on various tasks.

Abstract

We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. Despite their popularity and efficiency in training deep neural networks, traditional analyses of error feedback algorithms rely on the smoothness assumption that does not capture the properties of objective functions in these problems. Rather, these problems have recently been shown to satisfy generalized smoothness assumptions, and the theoretical understanding of error feedback algorithms under these assumptions remains largely unexplored. Moreover, to the best of our knowledge, all existing analyses under generalized smoothness either i) focus on single-node settings or ii) make unrealistically strong assumptions for distributed settings, such as requiring data heterogeneity, and almost surely bounded stochastic gradient noise variance. In this paper,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum· slideslive

Taxonomy

TopicsStochastic processes and financial applications · Numerical Methods and Algorithms · Stability and Control of Uncertain Systems

MethodsFocus