Convergence of First-Order Methods for Constrained Nonconvex   Optimization with Dependent Data

Ahmet Alacaoglu; Hanbaek Lyu

arXiv:2203.15797·math.OC·June 26, 2023

Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data

Ahmet Alacaoglu, Hanbaek Lyu

PDF

Open Access

TL;DR

This paper analyzes stochastic projected gradient methods for constrained nonconvex optimization with dependent data, achieving improved convergence rates under mild mixing conditions and extending results to various algorithms and applications.

Contribution

It provides the first convergence analysis for constrained nonconvex optimization with dependent data under mild mixing conditions, improving complexity bounds and extending to multiple algorithms.

Findings

01

Achieves $ ilde{O}(t^{-1/4})$ convergence rate.

02

Reduces complexity from $ ilde{O}( ext{} ext{varepsilon}^{-8})$ to $ ilde{O}( ext{} ext{varepsilon}^{-4})$.

03

Extends results to stochastic proximal gradient, AdaGrad, and heavy ball momentum.

Abstract

We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O} (t^{- 1/4})$ and complexity $\tilde{O} (ε^{- 4})$ for achieving an $ε$ -near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O} (ε^{- 8})$ to $\tilde{O} (ε^{- 4})$ with a significantly simpler analysis. We illustrate the generality of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsAdaGrad