DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

Chuanyang Zheng; Yihang Gao; Han Shi; Minbin Huang; Jingyao Li; Jing; Xiong; Xiaozhe Ren; Michael Ng; Xin Jiang; Zhenguo Li; Yu Li

arXiv:2405.14722·cs.CL·November 6, 2024

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing, Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces DAPE, a data-adaptive positional encoding method for transformers that dynamically adjusts to input data, significantly improving length extrapolation and generalization capabilities on real-world datasets.

Contribution

We propose a novel DAPE method that adapts positional encoding based on input context, enhancing length generalization in transformer models beyond static encoding approaches.

Findings

01

DAPE improves performance on length extrapolation tasks.

02

DAPE outperforms static positional encoding methods at longer sequence lengths.

03

The model maintains local and anti-local information effectively.

Abstract

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques · Video Analysis and Summarization