Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Wanli Shi; Yi Chang; Bin Gu

arXiv:2406.17386·math.OC·September 4, 2024

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Wanli Shi, Yi Chang, Bin Gu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel single-loop double-momentum hypergradient method for lower-level constrained bilevel optimization, removing restrictive assumptions and providing convergence guarantees, with demonstrated effectiveness in experiments.

Contribution

It proposes a new hypergradient leveraging nonsmooth implicit function theory and a single-loop algorithm with convergence analysis for LCBO.

Findings

01

Achieves a $(\delta,\epsilon)$-stationary point in $ ilde{O}(d_2^2\epsilon^{-4})$ iterations.

02

Removes restrictive differentiability and invertibility assumptions.

03

Demonstrates effectiveness through experiments on two applications.

Abstract

Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

S1. The work is well motivated. Finding a simple yet effective method for lower-level constrained bilevel optimization problems is both interesting and important. S2. The paper is well written and easy to follow. The algorithm design is new and non-asymptotic convergence is provided. S3. The authors conduct numerous experiments to showcase the efficiency and effectiveness of the proposed approach. Additionally, the paper includes several ablation studies in the Appendix.

Weaknesses

W1. By Remark 2 on page 7, $\tilde{\mathcal{O}}(\frac{\sqrt{d_2}}{\delta K^{1/4}})\leq \epsilon$ implies that $K=\tilde{\mathcal{O}}(\frac{d_2^2}{\delta^4 \epsilon^4})$, NOT $\tilde{\mathcal{O}}(\frac{\sqrt{d_2}}{\delta \epsilon^4})$. Additionally, since $\delta=\mathcal{O}(\epsilon d_2^{-3/2})$ by (11), the iteration number $K=\tilde{\mathcal{O}}(\frac{d_2^8}{\epsilon^8})$. W2. The authors should consider comparing their method with closely related papers addressing lower-level constrained bil

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

The proposed algorithm is a single-loop single-timescale approach.

Weaknesses

1. Assumption 3 is restrictive to satisfy. Furthermore, even the problems examined in the numerical experiments fail to meet this assumption. 2. In order to achieve a stationary point with $\|| \nabla F (x) \|| \le \epsilon$, as outlined in Remark 2, the proposed algorithm necessitates a choice of the smooth parameter on the order of $O(\epsilon d_2^{-3/2})$. Consequently, the algorithm would require a minimum of approximately $\tilde{O}(d_2^8/\epsilon^8)$ iterations. It appears, however, that

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The experimental results are great for proposed algorithm.

Weaknesses

1. The proposed algorithm DMLCBO is based on double momentum technique. In previous works, e.g., SUSTAIN[1] and MRBO[2], double momentum technique improves the convergence rate to $\mathcal{\widetilde O}(\epsilon^{-3})$ while proposed algorithm only achieves the $\mathcal{\widetilde O}(\epsilon^{-4})$. The authors are encouraged to discuss the reason why DMLCBO does not achieve it and the theoretical technique difference between DMLCBO and above mentioned works. 2. In the experimental part, the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Advanced Numerical Methods in Computational Mathematics · Advanced Optimization Algorithms Research