Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu, Zhongze Cai, Guanting Chen, Xiaocheng Li

TL;DR
This paper investigates how Transformers learn in-context by training on linear regression tasks with uncertainty quantification, revealing their near Bayes-optimal behavior and limitations under distribution shifts.
Contribution
It introduces a bi-objective training method for Transformers that includes uncertainty quantification, providing new insights into in-context learning and generalization bounds.
Findings
Transformers reach near Bayes-optimal solutions in linear regression tasks.
The method offers sharper generalization bounds of ilde{O}(rac{ oot}{ ext{min} ext{,} S, T}) compared to previous work.
Transformers do not necessarily perform Bayesian inference under task shifts, despite being Bayes-optimal in distribution.
Abstract
Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation and the conditional variance Var. This additional uncertainty quantification objective provides a handle to (i) better design out-of-distribution experiments to distinguish ICL from in-weight learning (IWL) and (ii) make a better separation between the algorithms with and without using the prior information of the training distribution. Theoretically, we show that the trained Transformer reaches near Bayes-optimum, suggesting the usage of the information of the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Intelligent Tutoring Systems and Adaptive Learning · Fault Detection and Control Systems
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
