Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization
Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang, Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi

TL;DR
This paper introduces $(FG)^2U$, a memory-efficient, scalable, and accurate gradient unrolling method for large-scale bi-level optimization, enabling effective hierarchical machine learning model training.
Contribution
The paper presents $(FG)^2U$, a novel unbiased stochastic gradient approximation method that overcomes memory and approximation limitations of existing approaches in large-scale bi-level optimization.
Findings
$(FG)^2U$ achieves more accurate gradient estimates.
It supports parallel computing for efficiency.
It outperforms existing methods in large-scale tasks.
Abstract
Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce orward radient nrolling with orward radient, abbreviated as , which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimization. circumvents the memory and approximation issues associated with classical bi-level optimization approaches, and delivers significantly more accurate gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques
