A Stochastic Approach to Bi-Level Optimization for Hyperparameter   Optimization and Meta Learning

Minyoung Kim; Timothy M. Hospedales

arXiv:2410.10417·cs.LG·October 15, 2024

A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning

Minyoung Kim, Timothy M. Hospedales

PDF

Open Access 1 Video

TL;DR

This paper introduces a stochastic reformulation of bi-level optimization problems in deep learning, utilizing SGLD for sampling and a novel approximation to improve scalability and robustness in hyperparameter and meta learning tasks.

Contribution

It presents a new stochastic perspective on bi-level optimization, enabling scalable, robust solutions for large models and diverse meta learning applications.

Findings

01

Achieves promising results on various meta learning benchmarks.

02

Scales to learning 87 million hyperparameters in Vision Transformers.

03

Provides more stable and reliable solutions compared to existing methods.

Abstract

We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning, including hyperparameter optimization, loss function learning, few-shot learning, invariance learning and more. These problems are often formalized as Bi-Level optimizations (BLO). We introduce a novel perspective by turning a given BLO problem into a stochastic optimization, where the inner loss function becomes a smooth probability distribution, and the outer loss becomes an expected loss over the inner distribution. To solve this stochastic optimization, we adopt Stochastic Gradient Langevin Dynamics (SGLD) MCMC to sample inner distribution, and propose a recurrent algorithm to compute the MC-estimated hypergradient. Our derivation is similar to forward-mode differentiation, but we introduce a new first-order approximation that makes it feasible for large models without needing to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning· underline

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Heat Transfer and Optimization