Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang, Wang

TL;DR
This paper models large language models as latent variable models to explain in-context learning and proposes an algorithm to select effective demonstrations, significantly improving performance across multiple datasets and models.
Contribution
It introduces a Bayesian perspective to understand LLMs as latent variable models and presents a demonstration selection algorithm that enhances in-context learning.
Findings
Improved performance over baselines on eight GPT models and datasets
Effective demonstration selection enhances in-context learning
Supports the hypothesis that LLMs infer latent task variables
Abstract
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises from regular language model pretraining objectives remain disconnected from the real-world LLMs. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LM, and then directly generalize the selected demonstrations to larger LMs. We demonstrate significant improvement over baselines, averaged over eight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Linear Layer · Dense Connections · Linear Warmup With Cosine Annealing · Dropout · Adam · Attention Dropout
