SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax   optimization

Hanseul Cho; Chulhee Yun

arXiv:2210.05995·math.OC·February 21, 2023

SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization

Hanseul Cho, Chulhee Yun

PDF

Open Access

TL;DR

This paper analyzes the convergence of stochastic gradient descent-ascent with random reshuffling for nonconvex-nonconcave minimax problems, showing faster rates than traditional methods and providing lower bounds.

Contribution

It provides the first theoretical convergence bounds for SGDA with reshuffling in nonconvex-P{ extL}ojasiewicz minimax settings, extending to mini-batch and full-batch scenarios.

Findings

01

Faster convergence rates for SGDA-RR compared to with-replacement SGDA.

02

Extension of convergence analysis to mini-batch SGDA-RR.

03

A lower bound for GDA with arbitrary step-size ratio matching upper bounds in certain cases.

Abstract

Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-{\L}ojasiewicz (P{\L}) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-P{\L} and primal-P{\L}-P{\L} objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods