You Only Fine-tune Once: Many-Shot In-Context Fine-Tuning for Large Language Models
Wenchong He, Liqian Peng, Zhe Jiang, Alex Go

TL;DR
This paper introduces Many-Shot In-Context Fine-Tuning (ManyICL), a method that enhances large language models' performance on multiple tasks by treating in-context examples as supervised targets, narrowing the gap with dedicated fine-tuning.
Contribution
The paper proposes ManyICL, a novel training objective and approach that improves multi-task learning in LLMs by leveraging in-context examples as supervised signals, reducing the need for separate fine-tuning.
Findings
ManyICL outperforms zero/few-shot fine-tuning on diverse tasks.
It approaches the performance of dedicated fine-tuning.
It mitigates catastrophic forgetting in multi-task settings.
Abstract
Large language models (LLMs) possess a remarkable ability to perform in-context learning (ICL), which enables them to handle multiple downstream tasks simultaneously without requiring task-specific fine-tuning. Recent studies have shown that even moderately sized LLMs, such as Mistral 7B, Gemma 7B and Llama-3 8B, can achieve ICL through few-shot in-context fine-tuning of all tasks at once. However, this approach still lags behind dedicated fine-tuning, where a separate model is trained for each individual task. In this paper, we propose a novel approach, Many-Shot In-Context Fine-tuning (ManyICL), which significantly narrows this performance gap by extending the principles of ICL to a many-shot setting. To unlock the full potential of ManyICL and address the inherent inefficiency of processing long sequences with numerous in-context examples, we propose a novel training objective.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
