Second-Order Min-Max Optimization with Lazy Hessians

Lesi Chen; Chengchang Liu; Jingzhao Zhang

arXiv:2410.09568·math.OC·April 16, 2025

Second-Order Min-Max Optimization with Lazy Hessians

Lesi Chen, Chengchang Liu, Jingzhao Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a more efficient second-order optimization method for convex-concave minimax problems that reuses Hessian computations, reducing overall computational complexity and improving upon previous methods.

Contribution

It proposes a novel approach to reuse Hessians across iterations, significantly lowering computational costs for second-order minimax optimization methods.

Findings

01

Reduced computational complexity by a factor of d^{1/3}

02

Achieved faster convergence rates for convex-concave minimax problems

03

Numerical experiments confirm improved efficiency over existing methods

Abstract

This paper studies second-order methods for convex-concave minimax optimization. Monteiro and Svaiter (2012) proposed a method to solve the problem with an optimal iteration complexity of $O (ϵ^{- 3/2})$ to find an $ϵ$ -saddle point. However, it is unclear whether the computational complexity, $O ((N + d^{2}) d ϵ^{- 2/3})$ , can be improved. In the above, we follow Doikov et al. (2023) and assume the complexity of obtaining a first-order oracle as $N$ and the complexity of obtaining a second-order oracle as $d N$ . In this paper, we show that the computation cost can be reduced by reusing Hessian across iterations. Our methods take the overall computational complexity of $\tilde{O} ((N + d^{2}) (d + d^{2/3} ϵ^{- 2/3}))$ , which improves those of previous methods by a factor of $d^{1/3}$ . Furthermore, we generalize our method to…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 8Confidence 4

Strengths

1. A new algorithm in min-max optimization with better computational complexity versus existing results. 2. The paper is well organized, the flow is easy to follow.

Weaknesses

1. The main component seems to be a combination of Doikov et al. (2023) on lazy Hessian and Adil et al., (2022) on extragradient, which may restrict the novelty a bit. 2. The experiment can be further enhanced. - First, the $O(d^{1/3})$ improvement suggests the outperformance is valid in high-dimensional cases (while not in low-dimensional cases), now the experiment cannot exhibit such a pattern, how does the algorithm perform in low-dimensional case? - It is not clear how the choice of $

Reviewer 02Rating 8Confidence 4

Strengths

I guess the paper is good from mathematical point of view. The results is strong, original, well presented. I agree with authors that they developed significantly new tricks to work with this class of problems. I guess the paper is good!

Weaknesses

For me the main drawback is motivation. I do not understand why we should use second-order method with expensive iteration rather than the first-order one. I understand the motivation for convex optimization where the number of iteration significantly reduces by using optimal second-order scheme, but I do not understand it for SPP where the difference is minor.

Reviewer 03Rating 8Confidence 3

Strengths

Authors propose a method with the same oracle complexity, as state-of-the-art methods but with less computational complexity. This means, that their algorithm can achieve the same estimation error with the same number of iterations, but overall spending less computational resources and taking less time. The experimental results only support this point. This makes method more attractive from the practical point of view. The paper is written in a clear way, and it is easy to understand.

Weaknesses

Overall, the paper feels like a very incremental result. Authors employ a known technique to reduce number of Hessian computations to existing second-order method to solve convex-concave min-max problem. To adapt proposed method to strongly-convex-strongly-concave problem, authors use a universal restarts framework, that works like a "wrap" around any method for convex(-concave) problems and gives better theoretical convergence for strongly-convex(-strongly-concave). Despite the fact that this p

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Quantum Computing Algorithms and Architecture