Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm

Yuman Wu; Xiaochuan Gong; Jie Hao; Mingrui Liu

arXiv:2603.00027·math.OC·March 3, 2026

Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm

Yuman Wu, Xiaochuan Gong, Jie Hao, Mingrui Liu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new class of bilevel optimization problems with lower-level uniform convexity, providing a novel theoretical framework and an efficient stochastic algorithm with provable convergence guarantees.

Contribution

It characterizes the smoothness of the hyperobjective under lower-level uniform convexity and proposes UniBiO, a stochastic algorithm with optimal complexity bounds for this class.

Findings

01

Established a new implicit differentiation theorem for uniformly convex lower-level functions.

02

Designed UniBiO, a stochastic algorithm with convergence guarantees.

03

Achieved near-optimal oracle complexity bounds matching known rates.

Abstract

Bilevel optimization is a hierarchical framework where an upper-level optimization problem is constrained by a lower-level problem, commonly used in machine learning applications such as hyperparameter optimization. Existing bilevel optimization methods typically assume strong convexity or Polyak-{\L}ojasiewicz (PL) conditions for the lower-level function to establish non-asymptotic convergence to a solution with small hypergradient. However, these assumptions may not hold in practice, and recent work~\citep{chen2024finding} has shown that bilevel optimization is inherently intractable for general convex lower-level functions with the goal of finding small hypergradients. In this paper, we identify a tractable class of bilevel optimization problems that interpolates between lower-level strong convexity and general convexity via \emph{lower-level uniform convexity}. For uniformly…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper addresses an important and timely problem by extending bilevel optimization analysis beyond the conventional lower-level strong convexity (LLSC) assumption. 2. The introduction of lower-level uniform convexity (LLUC) as an intermediate class is a novel theoretical contribution. 3. A new implicit differentiation theorem is derived for the LLUC setting.

Weaknesses

1. The practical motivation for LLUC is not sufficiently justified; the provided examples (e.g., $\ell_p$-regression) appear contrived and do not reflect modern, complex bilevel learning tasks. 2. The theoretical framework relies on multiple technical and non-standard assumptions to establish convergence. 3. The proposed UniBiO algorithm appears structurally similar to existing methods (e.g., BO-REP), with limited algorithmic innovation.

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper identifies a novel and tractable class of bilevel problems with uniformly convex lower-level functions, providing a crucial pathway between strong convexity and general convexity. 2. The presentation is clear and well structured.

Weaknesses

The oracle complexity $\widetilde{O}(\epsilon^{-5p+6})$ becomes prohibitively high for large $p$, creating a significant gap between theoretical tractability and practical efficiency for near-general convex problems.

Reviewer 03Rating 4Confidence 4

Strengths

1. This paper introduces the concept of lower-level uniform convexity (LLUC) to bilevel optimization, which is new. 2. For uniformly convex lower-level functions with exponent $p \ge 2$, the authors establish an implicit differentiation theorem that characterizes the smoothness of the hyperobjective. 3. The authors design a stochastic algorithm, termed UniBiO, with provable convergence guarantees, and their algorithm achieves $\tilde{O}(\epsilon^{-5p+6})$ oracle complexity bound for finding $\

Weaknesses

1. My primary concerns relate to the assumptions and presentation of the paper: 1-1. In Assumption 3.2(v), the notation $d[y]^{o~p-1}$ is not clearly given. Although the authors refer to Theorem 3.1, this theorem does not appear in the text. The authors also refer to Theorem A.1, this theorem also does not appear in the text (may be Definition A.1). Such omissions are critical, as this definition is important in the paper. Moreover, similar questions exist in this paper, e.g., in line 209, it

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Optimization and Variational Analysis