MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved   In-Context Learning

Sanchit Sinha; Yuguang Yue; Victor Soto; Mayank Kulkarni; Jianhua Lu,; Aidong Zhang

arXiv:2405.11446·cs.CL·May 21, 2024

MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning

Sanchit Sinha, Yuguang Yue, Victor Soto, Mayank Kulkarni, Jianhua Lu,, Aidong Zhang

PDF

Open Access

TL;DR

This paper introduces MAML-en-LLM, a meta-training method for large language models that enhances their ability to generalize and adapt to unseen tasks, outperforming existing approaches in various settings.

Contribution

The paper proposes MAML-en-LLM, a novel meta-training approach that produces more generalizable LLM parameters capable of better adaptation to new tasks.

Findings

01

2% improvement on unseen domain performance

02

4% enhancement in adaptation performance

03

Outperforms state-of-the-art meta-training methods across 7 task settings

Abstract

Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training