Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma, Ruoxiang Xu, Yongqiang Cai

TL;DR
This paper investigates how positional encoding enhances the ability of single-layer Transformers to perform in-context learning with a finite vocabulary, demonstrating that positional encoding is crucial for universal approximation.
Contribution
It shows that positional encoding enables single-layer Transformers to achieve the universal approximation property in vocabulary in-context learning, which is not possible without it.
Findings
Positional encoding is necessary for UAP in VICL.
Certain conditions on positional encoding ensure UAP.
Positional encoding improves approximation capabilities theoretically.
Abstract
Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, \emph{i.e.}, vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications · Machine Learning and Algorithms
