Distributed Stochastic Consensus Optimization with Momentum for   Nonconvex Nonsmooth Problems

Zhiguo Wang; Jiawei Zhang; Tsung-Hui Chang; Jian Li; Zhi-Quan Luo

arXiv:2011.05082·math.OC·September 1, 2021

Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

Zhiguo Wang, Jiawei Zhang, Tsung-Hui Chang, Jian Li, Zhi-Quan Luo

PDF

TL;DR

This paper introduces a novel distributed stochastic optimization algorithm with momentum for non-convex, non-smooth problems, achieving optimal communication complexity and demonstrating effectiveness in neural network training.

Contribution

It presents the first stochastic distributed algorithm with $ ext{O}(1/ ext{epsilon})$ communication complexity for non-convex, non-smooth problems, using a proximal primal-dual approach with Nesterov momentum.

Findings

01

Achieves $ ext{O}(1/ ext{epsilon}^2)$ computation complexity for $ ext{epsilon}$-stationary solutions.

02

Attains $ ext{O}(1/ ext{epsilon})$ communication complexity, lower than existing methods.

03

Effectively applied to distributed non-convex regression and neural network classification.

Abstract

While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov momentum for accelerated optimization of non-convex and non-smooth problems. Theoretically, we show that the proposed algorithm can achieve an $ϵ$ -stationary solution under a constant step size with $O (1/ ϵ^{2})$ computation complexity and $O (1/ ϵ)$ communication complexity. When compared to the existing gradient tracking based methods, the proposed algorithm has the same order of computation complexity but lower order of communication complexity. To the best of our knowledge, the presented result is the first stochastic algorithm with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.