In-Context Learning with Representations: Contextual Generalization of Trained Transformers
Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

TL;DR
This paper provides a theoretical analysis of how one-layer transformers can learn to generalize in-context to unseen examples and tasks by performing ridge regression, with proven convergence properties.
Contribution
It offers the first provable demonstration that transformers can learn contextual information for generalization from limited prompt data.
Findings
Training loss converges linearly to a global minimum.
Transformers effectively learn to perform ridge regression.
First theoretical proof of transformers generalizing with few prompt examples.
Abstract
In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored, particularly whether transformers can be trained to generalize to unseen examples in a prompt, which will require the model to acquire contextual knowledge of the prompt for generalization. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks. The contextual generalization here can be attained via learning the template function for each task in-context, where all template functions lie in a linear space with basis functions. We analyze the training dynamics of one-layer multi-head transformers to in-contextly predict unlabeled inputs given partially labeled prompts, where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Machine Learning and Data Classification
