TL;DR
This paper provides a new analysis of FedExProx, showing it can outperform Gradient Descent in distributed optimization by establishing tighter convergence rates and exploring adaptive strategies.
Contribution
We develop a novel analysis framework that proves FedExProx's superior performance over GD, including in partial participation and non-strongly convex settings.
Findings
FedExProx has a tighter linear convergence rate for quadratic problems.
Adaptive extrapolation strategies significantly outperform previous methods.
FedExProx outperforms GD when considering computation and communication costs.
Abstract
We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation. In the process, we uncover a surprising flaw: its known theoretical guarantees on quadratic optimization tasks are no better than those offered by the vanilla Gradient Descent (GD) method. Motivated by this observation, we develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems. By incorporating both computation and communication costs, we demonstrate that FedExProx can indeed provably outperform GD, in stark contrast to the original analysis. Furthermore, we consider partial participation scenarios and analyze two adaptive extrapolation strategies - based on gradient diversity and Polyak stepsizes - again significantly outperforming previous results.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
