Contextual Position Encoding: Learning to Count What's Important

Olga Golovneva; Tianlu Wang; Jason Weston; Sainbayar Sukhbaatar

arXiv:2405.18719·cs.CL·May 31, 2024·3 cites

Contextual Position Encoding: Learning to Count What's Important

Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar

PDF

Open Access

TL;DR

This paper introduces Contextual Position Encoding (CoPE), a novel method enabling language models to condition position information on context, improving their ability to handle abstract and higher-level positional tasks beyond simple token counts.

Contribution

The paper proposes CoPE, a new position encoding technique that allows context-dependent position conditioning, enhancing model generalization to complex positional tasks.

Findings

01

CoPE outperforms traditional PE methods on counting and selective copy tasks.

02

CoPE improves perplexity in language modeling and coding tasks.

03

CoPE enables attending to higher-level structures like sentences or nouns.

Abstract

The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$ -th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax