Large Language Models Are Latent Variable Models: Explaining and Finding   Good Demonstrations for In-Context Learning

Xinyi Wang; Wanrong Zhu; Michael Saxon; Mark Steyvers; William Yang; Wang

arXiv:2301.11916·cs.CL·February 14, 2024·27 cites

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang, Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper models large language models as latent variable models to explain in-context learning and proposes an algorithm to select effective demonstrations, significantly improving performance across multiple datasets and models.

Contribution

It introduces a Bayesian perspective to understand LLMs as latent variable models and presents a demonstration selection algorithm that enhances in-context learning.

Findings

01

Improved performance over baselines on eight GPT models and datasets

02

Effective demonstration selection enhances in-context learning

03

Supports the hypothesis that LLMs infer latent task variables

Abstract

In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises from regular language model pretraining objectives remain disconnected from the real-world LLMs. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LM, and then directly generalize the selected demonstrations to larger LMs. We demonstrate significant improvement over baselines, averaged over eight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangxinyilinda/concept-based-demonstration-selection
pytorchOfficial

Videos

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Linear Layer · Dense Connections · Linear Warmup With Cosine Annealing · Dropout · Adam · Attention Dropout