What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant

TL;DR
This paper demonstrates that standard Transformers can be trained from scratch to perform in-context learning on various function classes, including linear functions, neural networks, and decision trees, matching or surpassing traditional algorithms.
Contribution
The study empirically shows that Transformers can be trained to in-context learn multiple function classes, extending understanding of their capabilities beyond language tasks.
Findings
Transformers can learn linear functions with performance comparable to least squares.
In-context learning persists under distribution shifts between training and inference.
Transformers can effectively learn complex functions like neural networks and decision trees.
Abstract
In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dropout · Dense Connections
