On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew K. Lampinen, Arslan Chaudhry, Stephanie C.Y. Chan, Cody Wild, Diane Wan, Alex Ku, J\"org Bornschein, Razvan Pascanu, Murray Shanahan, James L. McClelland

TL;DR
This paper compares in-context learning and fine-tuning in large language models, revealing that ICL often generalizes more flexibly and proposing a method to enhance fine-tuning through in-context reasoning traces.
Contribution
The study introduces novel datasets for evaluating generalization, and proposes a method to improve fine-tuning by incorporating in-context reasoning traces.
Findings
ICL can generalize inferences more flexibly than fine-tuning in data-matched settings
Adding in-context reasoning traces to fine-tuning data improves generalization
Fine-tuning can sometimes generalize to reversals within larger knowledge structures
Abstract
Large language models exhibit exciting capabilities, yet can show surprisingly narrow generalization from finetuning. E.g. they can fail to generalize to simple reversals of relations they are trained on, or fail to make simple logical deductions based on trained information. These failures to generalize factual information from fine-tuning can significantly hinder the reasoning capabilities of these models. On the other hand, language models' in-context learning (ICL) shows different inductive biases and deductive reasoning capabilities. Here, we explore these differences in generalization and deductive reasoning between in-context- and fine-tuning-based learning. To do so, we constructed several novel datasets to evaluate and improve models' abilities to make generalizations over factual information from novel data. These datasets are designed to create clean tests of generalization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
