Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD   for Communication Efficient Nonconvex Distributed Learning

Tomoya Murata; Taiji Suzuki

arXiv:2202.06083·cs.LG·October 13, 2022

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

Tomoya Murata, Taiji Suzuki

PDF

Open Access 1 Video

TL;DR

This paper introduces BVR-L-PSGD, a novel distributed optimization algorithm that efficiently escapes saddle points and achieves second-order optimality with reduced communication, especially effective in low-heterogeneity data settings.

Contribution

The paper proposes BVR-L-PSGD, combining bias-variance reduction and perturbation, to find second-order optimal points with communication efficiency comparable to first-order methods.

Findings

01

Achieves second-order optimality with low communication complexity.

02

Outperforms non-local methods when data heterogeneity is small.

03

Communication complexity approaches constant when heterogeneity vanishes.

Abstract

In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Distributed Control Multi-Agent Systems

MethodsStochastic Gradient Descent