SeqPE: Transformer with Sequential Position Encoding

Huayang Li; Yahui Liu; Hongyu Sun; Deng Cai; Leyang Cui; Wei Bi; Peilin Zhao; Taro Watanabe

arXiv:2506.13277·cs.LG·June 18, 2025

SeqPE: Transformer with Sequential Position Encoding

Huayang Li, Yahui Liu, Hongyu Sun, Deng Cai, Leyang Cui, Wei Bi, Peilin Zhao, Taro Watanabe

PDF

Open Access 1 Repo 1 Models

TL;DR

SeqPE introduces a fully learnable, unified position encoding method for Transformers that improves extrapolation and generalization across modalities by representing positions as symbolic sequences and employing regularization techniques.

Contribution

The paper proposes SeqPE, a novel position encoding framework that enhances extrapolation and adaptability in Transformers through symbolic sequence representation and end-to-end learning.

Findings

01

Outperforms baselines in perplexity, EM, and accuracy.

02

Improves extrapolation to longer contexts and multi-dimensional inputs.

03

Enables seamless generalization without architectural redesign.

Abstract

Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position embeddings (PEs) limit extrapolation capabilities beyond pre-trained sequence lengths. Expert-designed methods such as ALiBi and RoPE, mitigate this limitation but demand extensive modifications for adapting to new modalities, underscoring fundamental challenges in adaptability and scalability. In this work, we present SeqPE, a unified and fully learnable position encoding framework that represents each $n$ -dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings in an end-to-end manner. To regularize SeqPE's embedding space, we introduce two complementary objectives: a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ghrua/seqpe
jaxOfficial

Models

🤗
ghrua/seqpe
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensor Technology and Measurement Systems · Photonic and Optical Devices

MethodsAttention with Linear Biases · Knowledge Distillation