How Do Transformers Learn In-Context Beyond Simple Functions? A Case   Study on Learning with Representations

Tianyu Guo; Wei Hu; Song Mei; Huan Wang; Caiming Xiong; Silvio; Savarese; Yu Bai

arXiv:2310.10616·cs.LG·October 17, 2023·2 cites

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio, Savarese, Yu Bai

PDF

Open Access

TL;DR

This paper investigates how transformer models perform in in-context learning with complex, compositional representations, combining theoretical analysis and empirical experiments to reveal underlying mechanisms.

Contribution

It introduces synthetic tasks with compositional structure, demonstrates transformers can implement optimal algorithms, and uncovers mechanisms like copying and representation selection.

Findings

01

Transformers achieve near-optimal ICL performance on complex tasks.

02

Lower layers transform data while upper layers perform linear ICL.

03

Mechanisms such as copying and representation selection are observed in trained models.

Abstract

While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic understanding focus mostly on simple scenarios such as learning simple function classes. This paper takes initial steps on understanding ICL in more complex scenarios, by studying learning with representations. Concretely, we construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function, composed with a linear function that differs in each instance. By construction, the optimal ICL algorithm first transforms the inputs by the representation function, and then performs linear ICL on top of the transformed dataset. We show theoretically the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsALIGN · Focus