Scalable Meta-Learning via Mixed-Mode Differentiation
Iurii Kemaev, Dan A Calian, Luisa M Zintgraf, Gregory Farquhar, Hado van Hasselt

TL;DR
This paper introduces MixFlow-MG, a mixed-mode differentiation algorithm that significantly improves the efficiency and scalability of gradient-based bilevel optimization in meta-learning, reducing memory usage and computation time.
Contribution
It proposes a novel mixed-mode differentiation approach, MixFlow-MG, tailored for bilevel optimization, enabling more efficient and scalable meta-learning applications.
Findings
Over 10x memory reduction compared to standard methods
Up to 25% wall-clock time improvement
Effective in modern meta-learning setups
Abstract
Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the gradient-based optimisation itself, leading to "gradient-of-a-gradient" calculations with computationally expensive second-order and mixed derivatives. While modern automatic differentiation libraries provide a convenient way to write programs for calculating these derivatives, they oftentimes cannot fully exploit the specific structure of these problems out-of-the-box, leading to suboptimal performance. In this paper, we analyse such cases and propose Mixed-Flow Meta-Gradients, or MixFlow-MG -- a practical algorithm that uses mixed-mode differentiation to construct more efficient and scalable computational graphs yielding over 10x memory and up to 25%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFlow Measurement and Analysis · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
