Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL
Xuhan Tong, Yuchen Zeng, Jiawei Zhang

TL;DR
This paper provides a theoretical framework for understanding in-context learning (ICL) in large language models, analyzing how demonstration quality, chain-of-thought prompting, and prompt templates influence generalization to new tasks.
Contribution
It introduces a theoretical analysis of ICL that connects practical design choices to model performance, including the effects of demonstration selection, CoT prompting, and prompt templates.
Findings
Performance depends on demonstration quality and model capability.
Chain-of-thought prompting helps decompose tasks into easier subtasks.
Prompt sensitivity varies with the number of demonstrations.
Abstract
In-Context Learning (ICL) enables pretrained LLMs to adapt to downstream tasks by conditioning on a small set of input-output demonstrations, without any parameter updates. Although there have been many theoretical efforts to explain how ICL works, most either rely on strong architectural or data assumptions, or fail to capture the impact of key practical factors such as demonstration selection, Chain-of-Thought (CoT) prompting, the number of demonstrations, and prompt templates. We address this gap by establishing a theoretical analysis of ICL under mild assumptions that links these design choices to generalization behavior. We derive an upper bound on the ICL test loss, showing that performance is governed by (i) the quality of selected demonstrations, quantified by Lipschitz constants of the ICL loss along paths connecting test prompts to pretraining samples, (ii) an intrinsic ICL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Multimodal Machine Learning Applications
