Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
Junchi Yang, Xiang Li, Niao He

TL;DR
This paper introduces NeAda, a nested adaptive algorithm for nonconvex minimax optimization that automatically balances primal and dual updates, achieving near-optimal convergence without prior parameter knowledge.
Contribution
The paper proposes NeAda, a novel nested adaptive framework that ensures parameter-agnostic convergence in nonconvex minimax problems, overcoming limitations of direct adaptive extensions.
Findings
NeAda achieves near-optimal gradient complexities in theory.
NeAda is robust and automatically balances primal and dual updates.
First algorithm with near-optimal rates and parameter-agnostic adaptation in this setting.
Abstract
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates. However, when it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. We provide such an example proving that the simple combination of Gradient Descent Ascent (GDA) with adaptive stepsizes can diverge if the primal-dual stepsize ratio is not carefully chosen; hence, a fortiori, such adaptive extensions are not parameter-agnostic. To address the issue, we formally introduce a Nested Adaptive framework, NeAda for short, that carries an inner loop for adaptively maximizing the dual variable with controllable stopping criteria and an outer loop for adaptively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques
MethodsAMSGrad · AdaGrad
