Memory-Efficient Gradient Unrolling for Large-Scale Bi-level   Optimization

Qianli Shen; Yezhen Wang; Zhouhao Yang; Xiang Li; Haonan Wang; Yang; Zhang; Jonathan Scarlett; Zhanxing Zhu; Kenji Kawaguchi

arXiv:2406.14095·cs.LG·December 25, 2024

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang, Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi

PDF

Open Access 1 Repo

TL;DR

This paper introduces $(FG)^2U$, a memory-efficient, scalable, and accurate gradient unrolling method for large-scale bi-level optimization, enabling effective hierarchical machine learning model training.

Contribution

The paper presents $(FG)^2U$, a novel unbiased stochastic gradient approximation method that overcomes memory and approximation limitations of existing approaches in large-scale bi-level optimization.

Findings

01

$(FG)^2U$ achieves more accurate gradient estimates.

02

It supports parallel computing for efficiency.

03

It outperforms existing methods in large-scale tasks.

Abstract

Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce $F$ orward $G$ radient $U$ nrolling with $F$ orward $F$ radient, abbreviated as $(FG)^{2} U$ , which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimization. $(FG)^{2} U$ circumvents the memory and approximation issues associated with classical bi-level optimization approaches, and delivers significantly more accurate gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shenqianli/fg2u
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques