Two-timescale Extragradient for Finding Local Minimax Points
Jiseok Chae, Kyuwon Kim, Donghwan Kim

TL;DR
This paper introduces a two-timescale extragradient method that converges to local minimax points in nonconvex-nonconcave problems, improving upon previous methods by removing key assumptions.
Contribution
It provides a novel two-timescale extragradient algorithm with convergence guarantees to local minimax points without requiring the Hessian to be nondegenerate.
Findings
Converges to local minimax points under mild conditions
Eliminates the need for nondegeneracy of the Hessian
Outperforms previous methods in theoretical guarantees
Abstract
Minimax problems are notoriously challenging to optimize. However, we present that the two-timescale extragradient method can be a viable solution. By utilizing dynamical systems theory, we show that it converges to points that satisfy the second-order necessary condition of local minimax points, under mild conditions that the two-timescale gradient descent ascent fails to work. This work provably improves upon all previous results on finding local minimax points, by eliminating a crucial assumption that the Hessian with respect to the maximization variable is nondegenerate.
Peer Reviews
Decision·ICLR 2024 poster
- This paper is the first to remove the non degenerate assumption in literature. It defines nature new notions of restricted Schur complement and strict non-minimax point in correspondence with their assumption. - The paper adopts the high order ODE of EG, to resolve the issue of avoiding nonminimax points. This approach utilizes continuous dynamics techniques, which are then adeptly extended to the analysis of discrete dynamics.
- The authors could enhance their presentation by including additional examples or illustrative figures that emphasize the significance of the non-degenerate assumption and its impact on the algorithm's practical applicability. i.e. are there any examples that two-time scale GDA fails while two-time scale EG works? - The absence of the conclusion and discussion sections from the main text disrupts the flow and detracts from the overall reading experience.
The paper addresses the important problem of local behaviour of two-timescale extragradient in nonmonotone problems, it is technically strong and well-written.
I only have the following remark: It seems that Thm. 6.2-6.4, Thm 6.6 and Thm. F.3 all treats _strict_ linearly stable points. The main claim that two-timescale extragradient finds (nonstrict degenerate) local minimax points (e.g. example 3) is found in Remark 6.8. The argument relies on showing avoidance of non local minimax point coupled with a _global_ convergence guarantee to a fixed point. The remark makes it appear as if global convergence for general nonconvex-nonconcave is solved, whi
The methods and conclusions in this article are valuable in terms of both originality and significance. 1. In terms of originality, the authors propose a new concept, the restricted Schur complement, to refine the second-order conditions. To study the stability, the authors introduce the concept of the *hemicurvature* to characterize the eigenvalues. These concepts and tools seem novel and fascinating. 2. Regarding significance, this paper improves upon previous results by eliminating the nond
This is room for improvement in the presentation. 1. The organization is less satisfactory and some results lack intuitive interpretation in the main text. I provide several examples below. * (i) The concept of hemicurvature is a bit opaque and the result of Proposition 6.7 is not intuitive. However, the figures in the appendix are a good illustration and could be put in the main text. * (ii) The authors mention that they adopt the hemicurvature instead of the curvature because of the proper
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Quantum Computing Algorithms and Architecture · Matrix Theory and Algorithms
