TL;DR
This paper compares fine-tuning and in-context learning in large language models using a formal language learning framework, revealing differences in proficiency, inductive biases, and model sensitivities.
Contribution
It introduces a formal language learning task and a discriminative proficiency test to rigorously compare FT and ICL modes in LLMs.
Findings
FT outperforms ICL on in-distribution generalization
Both modes perform similarly on out-of-distribution generalization
Inductive biases are similar at partial learning but diverge at higher proficiency
Abstract
Large language models (LLMs) operate in two fundamental learning modes - fine-tuning (FT) and in-context learning (ICL) - raising key questions about which mode yields greater language proficiency and whether they differ in their inductive biases. Prior studies comparing FT and ICL have yielded mixed and inconclusive results due to inconsistent experimental setups. To enable a rigorous comparison, we propose a formal language learning task - offering precise language boundaries, controlled string sampling, and no data contamination - and introduce a discriminative test for language proficiency, where an LLM succeeds if it assigns higher generation probability to in-language strings than to out-of-language strings. Empirically, we find that: (a) FT has greater language proficiency than ICL on in-distribution generalization, but both perform equally well on out-of-distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
