Meta-Learning at Scale for Large Language Models via Low-Rank Amortized Bayesian Meta-Learning

Liyi Zhang; Jake Snell; Thomas L. Griffiths

arXiv:2508.14285·cs.LG·April 3, 2026

Meta-Learning at Scale for Large Language Models via Low-Rank Amortized Bayesian Meta-Learning

Liyi Zhang, Jake Snell, Thomas L. Griffiths

PDF

TL;DR

This paper introduces ABMLL, a scalable meta-learning method for large language models that improves multi-dataset generalization and combines meta-learning with in-context learning.

Contribution

It adapts amortized Bayesian meta-learning to large language models using LoRA, enhancing multi-dataset generalization and scalability.

Findings

01

ABMLL outperforms existing methods on CrossFit and Unified-QA datasets.

02

Supports effective generalization across multiple datasets.

03

Combines meta-learning with in-context learning for further improvements.

Abstract

Fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, when a problem requires incorporating information from multiple datasets - as in few shot learning - generalization across datasets can be limited, driving up training costs. As a consequence, other approaches such as in-context learning are typically used in this setting. To address this challenge, we introduce an efficient method for adapting the weights of LLMs to multiple distributions, Amortized Bayesian Meta-Learning for LoRA (ABMLL). This method builds on amortized Bayesian meta-learning for smaller models, adapting this approach to LLMs by reframing where local and global variables are defined in LoRA and using a new hyperparameter to balance reconstruction accuracy and the fidelity of task-specific parameters to the global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.