Variance Reduced Distributed Non-Convex Optimization Using Matrix   Stepsizes

Hanmin Li; Avetik Karagulyan; Peter Richt\'arik

arXiv:2310.04614·math.OC·October 11, 2024

Variance Reduced Distributed Non-Convex Optimization Using Matrix Stepsizes

Hanmin Li, Avetik Karagulyan, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces variance-reduced matrix-stepsized algorithms for distributed non-convex optimization, demonstrating improved convergence and communication efficiency over existing methods through theoretical analysis and empirical validation.

Contribution

It proposes two variance-reduced algorithms, det-MARINA and det-DASHA, that enhance the performance of matrix-stepsized gradient descent in distributed non-convex settings.

Findings

01

det-MARINA and det-DASHA outperform existing methods in iteration complexity

02

The new algorithms achieve better communication efficiency

03

Theoretical and empirical results confirm improved convergence

Abstract

Matrix-stepsized gradient descent algorithms have been shown to have superior performance in non-convex optimization problems compared to their scalar counterparts. The det-CGD algorithm, as introduced by Li et al. (2023), leverages matrix stepsizes to perform compressed gradient descent for non-convex objectives and matrix-smooth problems in a federated manner. The authors establish the algorithm's convergence to a neighborhood of a weighted stationarity point under a convex condition for the symmetric and positive-definite matrix stepsize. In this paper, we propose two variance-reduced versions of the det-CGD algorithm, incorporating MARINA and DASHA methods. Notably, we establish theoretically and empirically, that det-MARINA and det-DASHA outperform MARINA, DASHA and the distributed det-CGD algorithms in terms of iteration and communication complexities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Distributed Control Multi-Agent Systems