The Curious Case of Absolute Position Embeddings

Koustuv Sinha; Amirhossein Kazemnejad; Siva Reddy; Joelle Pineau,; Dieuwke Hupkes; Adina Williams

arXiv:2210.12574·cs.CL·October 25, 2022·1 cites

The Curious Case of Absolute Position Embeddings

Koustuv Sinha, Amirhossein Kazemnejad, Siva Reddy, Joelle Pineau,, Dieuwke Hupkes, Adina Williams

PDF

Open Access 1 Repo

TL;DR

This paper investigates the limitations of absolute position embeddings in transformer language models, revealing their over-reliance on absolute positions and poor generalization to shifted sentence positions, questioning their effectiveness in modeling relative position information.

Contribution

The study demonstrates that models with absolute position embeddings struggle with shifted sentence positions, highlighting a fundamental limitation in current positional encoding methods.

Findings

01

Models degrade when sentences start from non-zero positions

02

Absolute position embeddings over-rely on positional information

03

Performance drops across various model sizes and types

Abstract

Transformer language models encode the notion of word order using positional information. Most commonly, this positional information is represented by absolute position embeddings (APEs), that are learned from the pretraining data. However, in natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. In this work, we observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information. Specifically, when models are subjected to sentences starting from a non-zero position (excluding the effect of priming), they exhibit noticeably degraded performance on zero to full-shot tasks, across a range of model families and model sizes. Our findings raise questions about the efficacy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kazemnejad/lm_pos_investigations
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Second Language Acquisition and Learning