Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations
Alexander Tyurin, Kaja Gruntkowska, Peter Richt\'arik

TL;DR
Freya PAGE is a novel asynchronous distributed optimization algorithm that achieves optimal time complexity for large-scale nonconvex finite-sum problems, effectively handling heterogeneous worker speeds and stragglers.
Contribution
It introduces Freya PAGE, the first method with optimal time complexity for large-scale nonconvex finite-sum optimization in heterogeneous asynchronous environments, with theoretical guarantees and a matching lower bound.
Findings
Freya PAGE outperforms previous methods like Asynchronous SGD and PAGE in time complexity.
The algorithm is robust to stragglers and heterogeneous worker speeds.
A tight lower bound confirms Freya PAGE's optimality in large-scale settings.
Abstract
In practical distributed systems, workers are typically not homogeneous, and due to differences in hardware configurations and network conditions, can have highly varying processing times. We consider smooth nonconvex finite-sum (empirical risk minimization) problems in this setup and introduce a new parallel method, Freya PAGE, designed to handle arbitrarily heterogeneous and asynchronous computations. By being robust to "stragglers" and adaptively ignoring slow computations, Freya PAGE offers significantly improved time complexity guarantees compared to all previous methods, including Asynchronous SGD, Rennala SGD, SPIDER, and PAGE, while requiring weaker assumptions. The algorithm relies on novel generic stochastic gradient collection strategies with theoretical guarantees that can be of interest on their own, and may be used in the design of future optimization methods. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Advanced Optimization Algorithms Research
MethodsStochastic Gradient Descent
