A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning
Minyoung Kim, Timothy M. Hospedales

TL;DR
This paper introduces a stochastic reformulation of bi-level optimization problems in deep learning, utilizing SGLD for sampling and a novel approximation to improve scalability and robustness in hyperparameter and meta learning tasks.
Contribution
It presents a new stochastic perspective on bi-level optimization, enabling scalable, robust solutions for large models and diverse meta learning applications.
Findings
Achieves promising results on various meta learning benchmarks.
Scales to learning 87 million hyperparameters in Vision Transformers.
Provides more stable and reliable solutions compared to existing methods.
Abstract
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning, including hyperparameter optimization, loss function learning, few-shot learning, invariance learning and more. These problems are often formalized as Bi-Level optimizations (BLO). We introduce a novel perspective by turning a given BLO problem into a stochastic optimization, where the inner loss function becomes a smooth probability distribution, and the outer loss becomes an expected loss over the inner distribution. To solve this stochastic optimization, we adopt Stochastic Gradient Langevin Dynamics (SGLD) MCMC to sample inner distribution, and propose a recurrent algorithm to compute the MC-estimated hypergradient. Our derivation is similar to forward-mode differentiation, but we introduce a new first-order approximation that makes it feasible for large models without needing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Heat Transfer and Optimization
