On the Limits of Momentum in Decentralized and Federated Optimization
Riccardo Zaccone, Sai Praneeth Karimireddy, Carlo Masone

TL;DR
This paper investigates the limitations of momentum in decentralized and federated optimization, showing that statistical heterogeneity fundamentally constrains convergence regardless of step-size schedules.
Contribution
It provides a theoretical analysis demonstrating that momentum cannot fully overcome heterogeneity in decentralized settings, and establishes convergence bounds under cyclic participation.
Findings
Momentum is affected by heterogeneity in decentralized optimization.
Decreasing step-sizes faster than 1/t do not guarantee convergence to optimal.
Numerical and deep learning experiments support the theoretical results.
Abstract
Recent works have explored the use of momentum in local methods to enhance distributed SGD. This is particularly appealing in Federated Learning (FL), where momentum intuitively appears as a solution to mitigate the effects of statistical heterogeneity. Despite recent progress in this direction, it is still unclear if momentum can guarantee convergence under unbounded heterogeneity in decentralized scenarios, where only some workers participate at each round. In this work we analyze momentum under cyclic client participation, and theoretically prove that it remains inevitably affected by statistical heterogeneity. Similarly to SGD, we prove that decreasing step-sizes do not help either: in fact, any schedule decreasing faster than leads to convergence to a constant value that depends on the initialization and the heterogeneity bound. Numerical results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
