Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
Tomoya Murata, Taiji Suzuki

TL;DR
This paper introduces BVR-L-PSGD, a novel distributed optimization algorithm that efficiently escapes saddle points and achieves second-order optimality with reduced communication, especially effective in low-heterogeneity data settings.
Contribution
The paper proposes BVR-L-PSGD, combining bias-variance reduction and perturbation, to find second-order optimal points with communication efficiency comparable to first-order methods.
Findings
Achieves second-order optimality with low communication complexity.
Outperforms non-local methods when data heterogeneity is small.
Communication complexity approaches constant when heterogeneity vanishes.
Abstract
In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Distributed Control Multi-Agent Systems
MethodsStochastic Gradient Descent
