Optimal Gradient Sliding and its Application to Distributed Optimization   Under Similarity

Dmitry Kovalev; Aleksandr Beznosikov; Ekaterina Borodich; Alexander; Gasnikov; Gesualdo Scutari

arXiv:2205.15136·math.OC·May 31, 2022·5 cites

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

Dmitry Kovalev, Aleksandr Beznosikov, Ekaterina Borodich, Alexander, Gasnikov, Gesualdo Scutari

PDF

Open Access

TL;DR

This paper introduces an accelerated gradient sliding method for structured convex optimization that reduces gradient computations and applies it to distributed problems, achieving optimal complexity bounds for communication and local gradient calls.

Contribution

The paper proposes a novel inexact accelerated gradient sliding algorithm that skips gradient evaluations, with applications to distributed optimization under function similarity, achieving optimal complexity bounds.

Findings

01

Achieves optimal gradient call complexity of $ ilde{O}(rac{ ext{sqrt}(L_p)}{ ext{sqrt}(und})))$ and $ ilde{O}(rac{ ext{sqrt}(L_q)}{ ext{sqrt}(und})))$.

02

First to establish lower complexity bounds for communication and local gradient calls in distributed optimization.

03

Extends the method to distributed saddle-point problems with improved complexity bounds.

Abstract

We study structured convex optimization problems, with additive objective $r := p + q$ , where $r$ is ( $μ$ -strongly) convex, $q$ is $L_{q}$ -smooth and convex, and $p$ is $L_{p}$ -smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of $p$ and $q$ , that is, $O (L_{p} / μ)$ and $O (L_{q} / μ)$ , respectively. This result is much sharper than the classic black-box complexity $O ((L_{p} + L_{q}) / μ)$ , especially when the difference between $L_{q}$ and $L_{q}$ is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Distributed Control Multi-Agent Systems