Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

Cac Phan; Kai Wang

arXiv:2511.09845·math.OC·November 18, 2025

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

Cac Phan, Kai Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces the first finite-time convergence guarantees for a first-order method solving linearly constrained stochastic bilevel optimization problems using only gradient information, overcoming significant theoretical challenges.

Contribution

It presents a novel first-order algorithm with finite-time guarantees for constrained stochastic bilevel optimization, a problem previously intractable with existing methods.

Findings

01

Achieves finite-time convergence guarantees for constrained stochastic bilevel problems.

02

Provides explicit bounds on hypergradient bias and variance.

03

Establishes the first finite-time complexity for this class of problems.

Abstract

This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods, requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The first-order methods for the constraint bilevel optimization problem have not been fully studied before. 2. The authors provide proof sketches for better understanding.

Weaknesses

**There are lots of presentation problems that I doubt the correctness of the proof.** 1. In line 191, there is an incomplete sentence. 2. There is no explanation of Algorithm 1 before Remark 4.1. Therefore, there are a lot of undefined notations in it. 3. There is no update rule for $\tilde{\lambda}(x)$. 4. For the stochastic algorithm, are the authors sure that we can get $\\|\\tilde{y}^\ast(x)-y^*(x)\\|\\leq\mathcal{O}(\delta)$ rather than in the expectation form with samples ($\mathbb{E}[\\

Reviewer 02Rating 6Confidence 3

Strengths

1. First finite-time guarantee for stochastic bilevel problems with linearly constrained LL subproblems, using a fully first-order method. 2. The presentation is clear.

Weaknesses

1. The experimental evaluation is limited; additional large-scale experiments would be valuable to demonstrate the method’s scalability and practical relevance. 2. The LICQ assumption appears somewhat strong. Could the authors consider relaxing it to a weaker constraint qualification, or provide more discussion on why this assumption is essential for the current analysis?

Reviewer 03Rating 2Confidence 4

Strengths

1. This paper provides a finite-time stochastic convergence with linear constraints and first-order access. 2. This paper also has strong theoretical grounding (bias/variance analysis and Goldstein stationarity), and the proposed method has superior scalability for high-dimensional problems.

Weaknesses

1. This paper announces that it provides the first finite-time convergence guarantees. However, there are several works about constraints in bilevel optimization, such as Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions. Can the author provide some comparison? 2. It looks like the Assumption 3.1 (ii) asks lower-level $g$ to be strongly convex and also have a bounded gradient. Can the author verify this assumption? 3. This paper also em

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Optimization and Variational Analysis · Risk and Portfolio Optimization