The Curious Case of Absolute Position Embeddings
Koustuv Sinha, Amirhossein Kazemnejad, Siva Reddy, Joelle Pineau,, Dieuwke Hupkes, Adina Williams

TL;DR
This paper investigates the limitations of absolute position embeddings in transformer language models, revealing their over-reliance on absolute positions and poor generalization to shifted sentence positions, questioning their effectiveness in modeling relative position information.
Contribution
The study demonstrates that models with absolute position embeddings struggle with shifted sentence positions, highlighting a fundamental limitation in current positional encoding methods.
Findings
Models degrade when sentences start from non-zero positions
Absolute position embeddings over-rely on positional information
Performance drops across various model sizes and types
Abstract
Transformer language models encode the notion of word order using positional information. Most commonly, this positional information is represented by absolute position embeddings (APEs), that are learned from the pretraining data. However, in natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. In this work, we observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information. Specifically, when models are subjected to sentences starting from a non-zero position (excluding the effect of priming), they exhibit noticeably degraded performance on zero to full-shot tasks, across a range of model families and model sizes. Our findings raise questions about the efficacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Second Language Acquisition and Learning
