In-Context Learning with Representations: Contextual Generalization of   Trained Transformers

Tong Yang; Yu Huang; Yingbin Liang; Yuejie Chi

arXiv:2408.10147·cs.LG·September 27, 2024

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical analysis of how one-layer transformers can learn to generalize in-context to unseen examples and tasks by performing ridge regression, with proven convergence properties.

Contribution

It offers the first provable demonstration that transformers can learn contextual information for generalization from limited prompt data.

Findings

01

Training loss converges linearly to a global minimum.

02

Transformers effectively learn to perform ridge regression.

03

First theoretical proof of transformers generalizing with few prompt examples.

Abstract

In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored, particularly whether transformers can be trained to generalize to unseen examples in a prompt, which will require the model to acquire contextual knowledge of the prompt for generalization. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks. The contextual generalization here can be attained via learning the template function for each task in-context, where all template functions lie in a linear space with $m$ basis functions. We analyze the training dynamics of one-layer multi-head transformers to in-contextly predict unlabeled inputs given partially labeled prompts, where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

In-Context Learning with Representations: Contextual Generalization of Trained Transformers· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Machine Learning and Data Classification