A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm

Konstantin Mishchenko; Franck Iutzeler; and J\'er\^ome Malick

arXiv:1806.09429·math.OC·December 13, 2019·SIAM J. Optim.

A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm

Konstantin Mishchenko, Franck Iutzeler, and J\'er\^ome Malick

PDF

Open Access

TL;DR

This paper introduces a scalable, asynchronous distributed optimization algorithm that adapts to various system delays and communication costs, with proven convergence guarantees and practical effectiveness in large-scale machine learning.

Contribution

It presents a novel flexible delay-tolerant proximal gradient algorithm with delay-independent stepsizes and proven convergence in both strongly convex and non-strongly convex settings.

Findings

01

Converges linearly for strongly convex problems.

02

Achieves convergence guarantees similar to standard proximal gradient.

03

Demonstrates effectiveness on large-scale machine learning tasks.

Abstract

We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective writes a sum of smooth functions, local to each worker, and a non-smooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines computational power, and functions smoothness. A unique feature is that the stepsizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Privacy-Preserving Technologies in Data