BROS: Bias-Corrected Randomized Subspaces for Memory-Efficient Single-Loop Bilevel Optimization
Hengrui Zhang, Boao Kong, Engao Zhang, Kun Yuan

TL;DR
BROS is a memory-efficient single-loop stochastic bilevel optimization method that maintains competitive convergence guarantees and reduces peak memory usage in deep learning tasks.
Contribution
It introduces a novel randomized subspace approach with unbiased Hessian-action estimation, matching the convergence rate of exact methods.
Findings
Reduces peak memory by up to 44.9% in experiments.
Maintains the same convergence rate as exact single-loop SBO methods.
Effectively applies to hyper-data cleaning, data-mixture learning, and ViT sample reweighting.
Abstract
Stochastic bilevel optimization (SBO) has become a standard framework for hyperparameter learning, data reweighting, representation learning, and data-mixture optimization in deep learning. Existing exact single-loop SBO methods and memory-efficient surrogate SBO methods either create severe memory pressure for large lower-level neural networks or lack competitive convergence guarantees under standard assumptions. In this paper, we propose BROS, a memory-efficient single-loop SBO method with the same convergence rate order as exact single-loop SBO methods. BROS performs lower and auxiliary updates in randomized subspaces with a Rademacher bi-probe correction that recovers an unbiased Hessian-action estimator. We prove that BROS preserves the sample complexity of MA-SOBA for finding an -stationary point under only standard assumptions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
