Towards Better Understanding of In-Context Learning Ability from   In-Context Uncertainty Quantification

Shang Liu; Zhongze Cai; Guanting Chen; Xiaocheng Li

arXiv:2405.15115·cs.LG·May 27, 2024·1 cites

Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification

Shang Liu, Zhongze Cai, Guanting Chen, Xiaocheng Li

PDF

Open Access

TL;DR

This paper investigates how Transformers learn in-context by training on linear regression tasks with uncertainty quantification, revealing their near Bayes-optimal behavior and limitations under distribution shifts.

Contribution

It introduces a bi-objective training method for Transformers that includes uncertainty quantification, providing new insights into in-context learning and generalization bounds.

Findings

01

Transformers reach near Bayes-optimal solutions in linear regression tasks.

02

The method offers sharper generalization bounds of ilde{O}(rac{ oot}{ ext{min} ext{,} S, T}) compared to previous work.

03

Transformers do not necessarily perform Bayesian inference under task shifts, despite being Bayes-optimal in distribution.

Abstract

Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation $E [Y ∣ X]$ and the conditional variance Var $(Y ∣ X)$ . This additional uncertainty quantification objective provides a handle to (i) better design out-of-distribution experiments to distinguish ICL from in-weight learning (IWL) and (ii) make a better separation between the algorithms with and without using the prior information of the training distribution. Theoretically, we show that the trained Transformer reaches near Bayes-optimum, suggesting the usage of the information of the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Intelligent Tutoring Systems and Adaptive Learning · Fault Detection and Control Systems

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections