qNBO: quasi-Newton Meets Bilevel Optimization

Sheng Fang; Yong-Jin Liu; Wei Yao; Chengming Yu; Jin Zhang

arXiv:2502.01076·cs.LG·February 4, 2025

qNBO: quasi-Newton Meets Bilevel Optimization

Sheng Fang, Yong-Jin Liu, Wei Yao, Chengming Yu, Jin Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces qNBO, a framework combining quasi-Newton methods with bilevel optimization to improve computational efficiency and convergence in hierarchical learning tasks, demonstrated through various real-world applications.

Contribution

It presents a novel framework that integrates quasi-Newton algorithms into bilevel optimization, enabling faster lower-level problem solving and inverse Hessian approximation with proven convergence.

Findings

01

Comparable or superior performance in real-world tasks

02

Efficient approximation of inverse Hessian-vector products

03

Non-asymptotic convergence analysis of BFGS in this context

Abstract

Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational hurdles, notably the computation of the exact lower-level solution and the inverse Hessian of the lower-level objective. Although these two aspects are inherently connected, existing methods typically handle them separately by solving the lower-level problem and a linear system for the inverse Hessian-vector product. In this paper, we introduce a general framework to address these computational challenges in a coordinated manner. Specifically, we leverage quasi-Newton algorithms to accelerate the resolution of the lower-level problem while efficiently approximating the inverse Hessian-vector product. Furthermore, by exploiting the superlinear…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 3

Strengths

The proposed method avoids costly second-order computations for approximating the Hessian-vector product while achieving a comparable convergence rate.

Weaknesses

The paper is challenging to follow, with a vague explanation of the quasi-Newton recursion scheme. The design of the proposed method appears overly complex compared to existing methods, and there is limited explanation regarding the validity of $u_{k+1}$ as an accurate approximation for the Hessian-vector product.

Reviewer 02Rating 8Confidence 4

Strengths

1. The paper is well written and easy to follow. Quasi-Newton type methods for solving bilevel optimization have not been well studied, even in the nonconvex-strongly-convex setting. This work provides a quite general algorithmic framework, allowing any quasi-Newton method to be applied in qNBO. 2. A convergence rate and complexity analysis are provided for qNBO (BFGS). Technical derivations seem to be nontrivial. The authors incorporate the superlinear convergence of BFGS into the non-asymptoti

Weaknesses

1. For the theory, the results in Theorems 3.3 and 3.6 are limited to the setting where $Q_k=k+1$. As a result, qNBO is a double-loop algorithm. Is it possible to design a single-loop version? Recent progress has been made toward single-loop bilevel optimization algorithms, especially in the nonconvex-strongly-convex setting, by using a warm-start strategy. 2. For the experiments, since the value of $Q_k$ affects the running time, it would be beneficial to empirically demonstrate how increasing

Reviewer 03Rating 5Confidence 5

Strengths

- The motivation and approach in the paper are quite interesting, it makes sense to accelerate convergence of the inner problem using quasi-newton approaches. It is also nice that the method provide convergence rates for their algorithm, although I think the analysis do not seem to capture the benefits of qn for computing the hyper-gradient (when computing the iterates u_k). - The paper is overall clearly written and well explained, which makes it easy to read. - The method seems to give qui

Weaknesses

- Experiments: The results in the toy experiment looked suspicious to me: AID-TN, AID-BIO and AmigoCG seemed to work unreasonably worse on a quite simple example where they are supposed to perform quite well. I decided to check the implementation provided in the supplementary and found a number of bugs that explain these results. Please refer to the questions section for the details of these bugs. I think these can be easily fixed. However, after fixing these bugs, the results do not exactly ma

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques · Iterative Methods for Nonlinear Equations · Numerical Methods and Algorithms