Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
Grigory Begunov, Alexander Tyurin

TL;DR
This paper introduces new compressed stochastic optimization methods that surpass existing lower bounds, enabling better scaling with the number of workers in distributed learning.
Contribution
It proposes Inkheart SGD and M4 algorithms that, under an additional structural assumption, achieve state-of-the-art complexities surpassing previous pessimistic limits.
Findings
New methods outperform traditional approaches in distributed settings.
Achieve scaling with the number of workers n, breaking previous lower bounds.
Provide theoretical guarantees under specific structural assumptions.
Abstract
In centralized, distributed, and federated learning with stochastic gradients and workers, it was recently shown that it is infeasible to find an -stationary point faster than seconds in both homogeneous and heterogeneous settings under standard assumptions: -smoothness, -bounded unbiased stochastic gradients, and lower boundedness of the function, i.e., for all , where , is the computation time, is the communication speed between the workers and the server, and is the dimension of the iterates and gradients. This result is pessimistic since it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
