Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets
Saeed Masiha, Zebang Shen, Negar Kiyavash, Niao He

TL;DR
This paper introduces a new method, HG-MS, for bilevel optimization with non-isolated solution sets, enabling differentiability and efficient convergence even in non-convex settings.
Contribution
It extends hyper-gradient formulas to manifold solution sets and proposes a practical select-then-differentiate algorithm with convergence guarantees.
Findings
HG-MS converges to stationary points with complexity depending on the solution manifold's intrinsic dimension.
The practical variant of HG-MS achieves top scores on GSM8K and MATH benchmarks.
Empirical results demonstrate the effectiveness of the method in large language model reweighting.
Abstract
We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--{\L}ojasiewicz (P{\L}) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive H\"older regularity of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
