On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
Lesi Chen, Jing Xu, Jingzhao Zhang

TL;DR
This paper investigates the computational difficulty of finding small hyper-gradients in bilevel optimization, providing hardness results and proposing improved algorithms with near-optimal complexity bounds under certain conditions.
Contribution
It establishes intractability results for general nonconvex-convex bilevel problems and introduces a simple first-order method with improved complexity bounds for problems satisfying the PL condition.
Findings
Hardness results show intractability for zero-respecting algorithms in nonconvex-convex bilevel problems.
A first-order algorithm achieves near-optimal complexity bounds in nonconvex-nonconvex bilevel problems with PL condition.
Complexity bounds are $ ilde{O}(rac{1}{ ext{epsilon}^2})$, $ ilde{O}(rac{1}{ ext{epsilon}^4})$, and $ ilde{O}(rac{1}{ ext{epsilon}^6})$ in different stochastic settings.
Abstract
Bilevel optimization reveals the inner structure of otherwise oblique optimization problems, such as hyperparameter tuning, neural architecture search, and meta-learning. A common goal in bilevel optimization is to minimize a hyper-objective that implicitly depends on the solution set of the lower-level function. Although this hyper-objective approach is widely used, its theoretical properties have not been thoroughly investigated in cases where the lower-level functions lack strong convexity. In this work, we first provide hardness results to show that the goal of finding stationary points of the hyper-objective for nonconvex-convex bilevel optimization can be intractable for zero-respecting algorithms. Then we study a class of tractable nonconvex-nonconvex bilevel problems when the lower-level function satisfies the Polyak-{\L}ojasiewicz (PL) condition. We show a simple first-order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
