Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Lesi Chen; Yaohua Ma; Jingzhao Zhang

arXiv:2306.14853·math.OC·March 25, 2026

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Lesi Chen, Yaohua Ma, Jingzhao Zhang

PDF

Open Access

TL;DR

This paper proposes a fully first-order method for bilevel optimization with strongly convex lower-level problems, achieving near-optimal convergence rates without Hessian-vector products, and extends to stochastic and accelerated settings.

Contribution

It introduces a two-time-scale update method that improves first-order oracle complexity to near-optimal levels for bilevel problems, including stochastic and accelerated variants.

Findings

01

Achieves $ ilde{ ext{O}}( ext{epsilon}^{-2})$ oracle complexity for deterministic case.

02

Attains $ ilde{ ext{O}}( ext{epsilon}^{-4})$ and $ ilde{ ext{O}}( ext{epsilon}^{-6})$ complexities in stochastic settings.

03

Provides accelerated rates of $ ilde{ ext{O}}( ext{epsilon}^{-1.75})$ with higher-order smoothness and noise injection.

Abstract

In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $ϵ$ -stationary point within $O (ϵ^{- 2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{O} (ϵ^{- 3})$ . In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde{O} (ϵ^{- 2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde{O} (ϵ^{- 4})$ and $\tilde{O} (ϵ^{- 6})$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning