On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

Lesi Chen; Jing Xu; Jingzhao Zhang

arXiv:2301.00712·math.OC·April 29, 2026

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

Lesi Chen, Jing Xu, Jingzhao Zhang

PDF

TL;DR

This paper investigates the computational difficulty of finding small hyper-gradients in bilevel optimization, providing hardness results and proposing improved algorithms with near-optimal complexity bounds under certain conditions.

Contribution

It establishes intractability results for general nonconvex-convex bilevel problems and introduces a simple first-order method with improved complexity bounds for problems satisfying the PL condition.

Findings

01

Hardness results show intractability for zero-respecting algorithms in nonconvex-convex bilevel problems.

02

A first-order algorithm achieves near-optimal complexity bounds in nonconvex-nonconvex bilevel problems with PL condition.

03

Complexity bounds are $ ilde{O}(rac{1}{ ext{epsilon}^2})$, $ ilde{O}(rac{1}{ ext{epsilon}^4})$, and $ ilde{O}(rac{1}{ ext{epsilon}^6})$ in different stochastic settings.

Abstract

Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-{\L}ojasiewicz (PL) condition. We show a simple first-order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.