Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes
Yongqiang Chen, Binghui Xie, Kaiwen Zhou, Bo Han, Yatao Bian, James, Cheng

TL;DR
This paper investigates the importance of positional information in in-context learning (ICL) for transformers, revealing that preserving permutation invariance enhances out-of-distribution performance, and positional encodings can break this invariance.
Contribution
It demonstrates that maintaining permutation invariance (ICL invariance) is crucial for OOD ICL, and that positional encodings in transformers can impair this invariance, proposing a method to improve robustness.
Findings
DeepSet outperforms transformers under distribution shifts.
Preserving ICL invariance improves OOD performance.
Positional encodings can break ICL invariance.
Abstract
In-context learning (ICL) refers to the ability of a model to condition on a few in-context demonstrations (input-output examples of the underlying task) to generate the answer for a new query input, without updating parameters. Despite the impressive ICL ability of LLMs, it has also been found that ICL in LLMs is sensitive to input demonstrations and limited to short context lengths. To understand the limitations and principles for successful ICL, we conduct an investigation with ICL linear regression of transformers. We characterize several Out-of-Distribution (OOD) cases for ICL inspired by realistic LLM ICL failures and compare transformers with DeepSet, a simple yet powerful architecture for ICL. Surprisingly, DeepSet outperforms transformers across a variety of distribution shifts, implying that preserving permutation invariance symmetry to input demonstrations is crucial for OOD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Machine Learning and Algorithms
MethodsLinear Regression
