Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma; Ruoxiang Xu; Yongqiang Cai

arXiv:2511.06376·cs.LG·November 11, 2025

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma, Ruoxiang Xu, Yongqiang Cai

PDF

Open Access

TL;DR

This paper investigates how positional encoding enhances the ability of single-layer Transformers to perform in-context learning with a finite vocabulary, demonstrating that positional encoding is crucial for universal approximation.

Contribution

It shows that positional encoding enables single-layer Transformers to achieve the universal approximation property in vocabulary in-context learning, which is not possible without it.

Findings

01

Positional encoding is necessary for UAP in VICL.

02

Certain conditions on positional encoding ensure UAP.

03

Positional encoding improves approximation capabilities theoretically.

Abstract

Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, \emph{i.e.}, vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications · Machine Learning and Algorithms