Local Methods with Adaptivity via Scaling
Savelii Chezhegov, Sergey Skorik, Nikolas Khachaturov, Danil Shalagin,, Aram Avetisyan, Martin Tak\'a\v{c}, Yaroslav Kholodov, Aleksandr Beznosikov

TL;DR
This paper proposes a unified framework combining local training with adaptive scaling methods like Adam to improve distributed learning efficiency, supported by theoretical analysis and practical neural network training results.
Contribution
It introduces a generic scaling enhancement to Local SGD, enabling unified analysis of adaptive methods in distributed training.
Findings
Enhanced Local SGD with scaling improves convergence.
Unified analysis applies to Adam, RMSProp, OASIS.
Practical validation shows performance gains.
Abstract
The rapid development of machine learning and deep learning has introduced increasingly complex optimization challenges that must be addressed. Indeed, training modern, advanced models has become difficult to implement without leveraging multiple computing nodes in a distributed environment. Distributed optimization is also fundamental to emerging fields such as federated learning. Specifically, there is a need to organize the training process to minimize the time lost due to communication. A widely used and extensively researched technique to mitigate the communication bottleneck involves performing local training before communication. This approach is the focus of our paper. Concurrently, adaptive methods that incorporate scaling, notably led by Adam, have gained significant popularity in recent years. Therefore, this paper aims to merge the local training technique with the adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering · Matrix Theory and Algorithms · Mathematical Biology Tumor Growth
MethodsOASIS · Focus · RMSProp · Stochastic Gradient Descent · Local SGD · Adam
