Positional Information Matters for Invariant In-Context Learning: A Case   Study of Simple Function Classes

Yongqiang Chen; Binghui Xie; Kaiwen Zhou; Bo Han; Yatao Bian; James; Cheng

arXiv:2311.18194·cs.LG·December 1, 2023·1 cites

Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

Yongqiang Chen, Binghui Xie, Kaiwen Zhou, Bo Han, Yatao Bian, James, Cheng

PDF

Open Access

TL;DR

This paper investigates the importance of positional information in in-context learning (ICL) for transformers, revealing that preserving permutation invariance enhances out-of-distribution performance, and positional encodings can break this invariance.

Contribution

It demonstrates that maintaining permutation invariance (ICL invariance) is crucial for OOD ICL, and that positional encodings in transformers can impair this invariance, proposing a method to improve robustness.

Findings

01

DeepSet outperforms transformers under distribution shifts.

02

Preserving ICL invariance improves OOD performance.

03

Positional encodings can break ICL invariance.

Abstract

In-context learning (ICL) refers to the ability of a model to condition on a few in-context demonstrations (input-output examples of the underlying task) to generate the answer for a new query input, without updating parameters. Despite the impressive ICL ability of LLMs, it has also been found that ICL in LLMs is sensitive to input demonstrations and limited to short context lengths. To understand the limitations and principles for successful ICL, we conduct an investigation with ICL linear regression of transformers. We characterize several Out-of-Distribution (OOD) cases for ICL inspired by realistic LLM ICL failures and compare transformers with DeepSet, a simple yet powerful architecture for ICL. Surprisingly, DeepSet outperforms transformers across a variety of distribution shifts, implying that preserving permutation invariance symmetry to input demonstrations is crucial for OOD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Machine Learning and Algorithms

MethodsLinear Regression