RoFormer: Enhanced Transformer with Rotary Position Embedding

Jianlin Su; Yu Lu; Shengfeng Pan; Ahmed Murtadha; Bo Wen; Yunfeng Liu

arXiv:2104.09864·cs.CL·November 9, 2023·229 cites

RoFormer: Enhanced Transformer with Rotary Position Embedding

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu

PDF

Open Access 5 Repos 10 Models 3 Datasets

TL;DR

This paper introduces RoFormer, a transformer model enhanced with Rotary Position Embedding (RoPE), which effectively encodes positional information, improves long text classification, and offers theoretical insights into its advantages.

Contribution

The paper proposes RoPE, a novel rotary position embedding method that improves transformer models by encoding absolute and relative positions more effectively.

Findings

01

RoFormer outperforms existing models on long text classification benchmarks.

02

RoPE provides flexible sequence length handling and decays inter-token dependency with distance.

03

Theoretical analysis explains the benefits of rotary position embedding.

Abstract

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsRotary Position Embedding