SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization
Hanseul Cho, Chulhee Yun

TL;DR
This paper analyzes the convergence of stochastic gradient descent-ascent with random reshuffling for nonconvex-nonconcave minimax problems, showing faster rates than traditional methods and providing lower bounds.
Contribution
It provides the first theoretical convergence bounds for SGDA with reshuffling in nonconvex-P{ extL}ojasiewicz minimax settings, extending to mini-batch and full-batch scenarios.
Findings
Faster convergence rates for SGDA-RR compared to with-replacement SGDA.
Extension of convergence analysis to mini-batch SGDA-RR.
A lower bound for GDA with arbitrary step-size ratio matching upper bounds in certain cases.
Abstract
Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-{\L}ojasiewicz (P{\L}) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-P{\L} and primal-P{\L}-P{\L} objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
