Double Momentum Method for Lower-Level Constrained Bilevel Optimization
Wanli Shi, Yi Chang, Bin Gu

TL;DR
This paper introduces a novel single-loop double-momentum hypergradient method for lower-level constrained bilevel optimization, removing restrictive assumptions and providing convergence guarantees, with demonstrated effectiveness in experiments.
Contribution
It proposes a new hypergradient leveraging nonsmooth implicit function theory and a single-loop algorithm with convergence analysis for LCBO.
Findings
Achieves a $(\delta,\epsilon)$-stationary point in $ ilde{O}(d_2^2\epsilon^{-4})$ iterations.
Removes restrictive differentiability and invertibility assumptions.
Demonstrates effectiveness through experiments on two applications.
Abstract
Bilevel optimization (BO) has recently gained prominence in many machine learning applications due to its ability to capture the nested structure inherent in these problems. Recently, many hypergradient methods have been proposed as effective solutions for solving large-scale problems. However, current hypergradient methods for the lower-level constrained bilevel optimization (LCBO) problems need very restrictive assumptions, namely, where optimality conditions satisfy the differentiability and invertibility conditions and lack a solid analysis of the convergence rate. What's worse, existing methods require either double-loop updates, which are sometimes less efficient. To solve this problem, in this paper, we propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a…
Peer Reviews
Decision·ICML 2024 Poster
S1. The work is well motivated. Finding a simple yet effective method for lower-level constrained bilevel optimization problems is both interesting and important. S2. The paper is well written and easy to follow. The algorithm design is new and non-asymptotic convergence is provided. S3. The authors conduct numerous experiments to showcase the efficiency and effectiveness of the proposed approach. Additionally, the paper includes several ablation studies in the Appendix.
W1. By Remark 2 on page 7, $\tilde{\mathcal{O}}(\frac{\sqrt{d_2}}{\delta K^{1/4}})\leq \epsilon$ implies that $K=\tilde{\mathcal{O}}(\frac{d_2^2}{\delta^4 \epsilon^4})$, NOT $\tilde{\mathcal{O}}(\frac{\sqrt{d_2}}{\delta \epsilon^4})$. Additionally, since $\delta=\mathcal{O}(\epsilon d_2^{-3/2})$ by (11), the iteration number $K=\tilde{\mathcal{O}}(\frac{d_2^8}{\epsilon^8})$. W2. The authors should consider comparing their method with closely related papers addressing lower-level constrained bil
The proposed algorithm is a single-loop single-timescale approach.
1. Assumption 3 is restrictive to satisfy. Furthermore, even the problems examined in the numerical experiments fail to meet this assumption. 2. In order to achieve a stationary point with $\|| \nabla F (x) \|| \le \epsilon$, as outlined in Remark 2, the proposed algorithm necessitates a choice of the smooth parameter on the order of $O(\epsilon d_2^{-3/2})$. Consequently, the algorithm would require a minimum of approximately $\tilde{O}(d_2^8/\epsilon^8)$ iterations. It appears, however, that
The experimental results are great for proposed algorithm.
1. The proposed algorithm DMLCBO is based on double momentum technique. In previous works, e.g., SUSTAIN[1] and MRBO[2], double momentum technique improves the convergence rate to $\mathcal{\widetilde O}(\epsilon^{-3})$ while proposed algorithm only achieves the $\mathcal{\widetilde O}(\epsilon^{-4})$. The authors are encouraged to discuss the reason why DMLCBO does not achieve it and the theoretical technique difference between DMLCBO and above mentioned works. 2. In the experimental part, the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Advanced Numerical Methods in Computational Mathematics · Advanced Optimization Algorithms Research
