Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints
Cac Phan, Kai Wang

TL;DR
This paper introduces the first finite-time convergence guarantees for a first-order method solving linearly constrained stochastic bilevel optimization problems using only gradient information, overcoming significant theoretical challenges.
Contribution
It presents a novel first-order algorithm with finite-time guarantees for constrained stochastic bilevel optimization, a problem previously intractable with existing methods.
Findings
Achieves finite-time convergence guarantees for constrained stochastic bilevel problems.
Provides explicit bounds on hypergradient bias and variance.
Establishes the first finite-time complexity for this class of problems.
Abstract
This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods, requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The first-order methods for the constraint bilevel optimization problem have not been fully studied before. 2. The authors provide proof sketches for better understanding.
**There are lots of presentation problems that I doubt the correctness of the proof.** 1. In line 191, there is an incomplete sentence. 2. There is no explanation of Algorithm 1 before Remark 4.1. Therefore, there are a lot of undefined notations in it. 3. There is no update rule for $\tilde{\lambda}(x)$. 4. For the stochastic algorithm, are the authors sure that we can get $\\|\\tilde{y}^\ast(x)-y^*(x)\\|\\leq\mathcal{O}(\delta)$ rather than in the expectation form with samples ($\mathbb{E}[\\
1. First finite-time guarantee for stochastic bilevel problems with linearly constrained LL subproblems, using a fully first-order method. 2. The presentation is clear.
1. The experimental evaluation is limited; additional large-scale experiments would be valuable to demonstrate the method’s scalability and practical relevance. 2. The LICQ assumption appears somewhat strong. Could the authors consider relaxing it to a weaker constraint qualification, or provide more discussion on why this assumption is essential for the current analysis?
1. This paper provides a finite-time stochastic convergence with linear constraints and first-order access. 2. This paper also has strong theoretical grounding (bias/variance analysis and Goldstein stationarity), and the proposed method has superior scalability for high-dimensional problems.
1. This paper announces that it provides the first finite-time convergence guarantees. However, there are several works about constraints in bilevel optimization, such as Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions. Can the author provide some comparison? 2. It looks like the Assumption 3.1 (ii) asks lower-level $g$ to be strongly convex and also have a bounded gradient. Can the author verify this assumption? 3. This paper also em
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Optimization and Variational Analysis · Risk and Portfolio Optimization
