Scalable Meta-Learning via Mixed-Mode Differentiation

Iurii Kemaev; Dan A Calian; Luisa M Zintgraf; Gregory Farquhar; Hado van Hasselt

arXiv:2505.00793·cs.LG·June 11, 2025

Scalable Meta-Learning via Mixed-Mode Differentiation

Iurii Kemaev, Dan A Calian, Luisa M Zintgraf, Gregory Farquhar, Hado van Hasselt

PDF

Open Access

TL;DR

This paper introduces MixFlow-MG, a mixed-mode differentiation algorithm that significantly improves the efficiency and scalability of gradient-based bilevel optimization in meta-learning, reducing memory usage and computation time.

Contribution

It proposes a novel mixed-mode differentiation approach, MixFlow-MG, tailored for bilevel optimization, enabling more efficient and scalable meta-learning applications.

Findings

01

Over 10x memory reduction compared to standard methods

02

Up to 25% wall-clock time improvement

03

Effective in modern meta-learning setups

Abstract

Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG -- a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFlow Measurement and Analysis · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning