Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions
Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander, Gasnikov

TL;DR
This paper extends the theoretical understanding of Local SGD by introducing the concept of approximate quadraticity and analyzing its convergence under unbounded noise conditions, broadening its applicability.
Contribution
It proposes a new framework for analyzing Local SGD on near-quadratic problems without relying on Lipschitz Hessian or bounded variance assumptions.
Findings
Convergence guarantees for Local SGD under approximate quadraticity.
Analysis of Local SGD with unbounded noise conditions.
Broader applicability of Local SGD in practical scenarios.
Abstract
Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic Gradient Descent (Local SGD), characterized by multiple local updates before averaging, which is particularly useful in distributed environments to reduce communication bottlenecks and improve scalability. A typical feature of this method is the dependence on the frequency of communications. But in the case of a quadratic target function with homogeneous data distribution over all devices, the influence of frequency of communications vanishes. As a natural consequence, subsequent studies include the assumption of a Lipschitz Hessian, as this indicates the similarity of the optimized function to a quadratic one to some extent. However, in order to extend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Numerical methods in inverse problems · Model Reduction and Neural Networks
