Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles
Lesi Chen, Yaohua Ma, Jingzhao Zhang

TL;DR
This paper proposes a fully first-order method for bilevel optimization with strongly convex lower-level problems, achieving near-optimal convergence rates without Hessian-vector products, and extends to stochastic and accelerated settings.
Contribution
It introduces a two-time-scale update method that improves first-order oracle complexity to near-optimal levels for bilevel problems, including stochastic and accelerated variants.
Findings
Achieves $ ilde{ ext{O}}( ext{epsilon}^{-2})$ oracle complexity for deterministic case.
Attains $ ilde{ ext{O}}( ext{epsilon}^{-4})$ and $ ilde{ ext{O}}( ext{epsilon}^{-6})$ complexities in stochastic settings.
Provides accelerated rates of $ ilde{ ext{O}}( ext{epsilon}^{-1.75})$ with higher-order smoothness and noise injection.
Abstract
In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an -stationary point within oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of . In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of and …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
