Negative Stepsizes Make Gradient-Descent-Ascent Converge
Henry Shugart, Jason M. Altschuler

TL;DR
This paper demonstrates that gradient-descent-ascent (GDA) can converge on min-max problems by using unconventional, time-varying, asymmetric, and negative stepsize schedules, challenging previous beliefs about GDA's failure.
Contribution
The authors introduce slingshot stepsize schedules, a novel approach with negative stepsizes, enabling GDA to converge without additional modifications on classical counterexamples.
Findings
GDA converges with negative, asymmetric, and time-varying stepsizes.
Slingshot stepsize schedules are necessary for convergence.
The method applies to the last iterate, aligning with practical use cases.
Abstract
Efficient computation of min-max problems is a central question in optimization, learning, games, and controls. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Evolutionary Algorithms and Applications
