Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning
Chengye Li, Haiyun Liu, Yuanxi Li

TL;DR
This paper presents a theoretical perspective that interprets in-context learning as a form of knowledge distillation, providing insights into its mechanisms and implications for prompt engineering.
Contribution
It introduces a novel theoretical framework viewing ICL as implicit knowledge distillation, with formal analysis and bounds explaining empirical phenomena.
Findings
ICL can be understood as a knowledge distillation process.
The bias of distilled weights grows linearly with MMD.
The framework unifies prior analyses of ICL.
Abstract
In-context learning (ICL) allows large language models (LLMs) to solve novel tasks without weight updates. Despite its empirical success, the mechanism behind ICL remains poorly understood, limiting our ability to interpret, improve, and reliably apply it. In this paper, we propose a new theoretical perspective that interprets ICL as an implicit form of knowledge distillation (KD), where prompt demonstrations guide the model to form a task-specific reference model during inference. Under this view, we derive a Rademacher complexity-based generalization bound and prove that the bias of the distilled weights grows linearly with the Maximum Mean Discrepancy (MMD) between the prompt and target distributions. This theoretical framework explains several empirical phenomena and unifies prior gradient-based and distributional analyses. To the best of our knowledge, this is the first to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Education and Learning Practices
