RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu

TL;DR
This paper introduces RoFormer, a transformer model enhanced with Rotary Position Embedding (RoPE), which effectively encodes positional information, improves long text classification, and offers theoretical insights into its advantages.
Contribution
The paper proposes RoPE, a novel rotary position embedding method that improves transformer models by encoding absolute and relative positions more effectively.
Findings
RoFormer outperforms existing models on long text classification benchmarks.
RoPE provides flexible sequence length handling and decays inter-token dependency with distance.
Theoretical analysis explains the benefits of rotary position embedding.
Abstract
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗EleutherAI/gpt-neox-20bmodel· 264k dl· ♡ 580264k dl♡ 580
- 🤗baichuan-inc/Baichuan-13B-Chatmodel· 8.8k dl· ♡ 6338.8k dl♡ 633
- 🤗StentorLabs/Stentor2-12Mmodel· 124 dl· ♡ 2124 dl♡ 2
- 🤗tiiuae/falcon-40bmodel· 22k dl· ♡ 243322k dl♡ 2433
- 🤗EleutherAI/gpt-j-6bmodel· 116k dl· ♡ 1523116k dl♡ 1523
- 🤗Milos/slovak-gpt-j-1.4Bmodel· 2.2k dl· ♡ 82.2k dl♡ 8
- 🤗Milos/slovak-gpt-j-162Mmodel· 845 dl· ♡ 2845 dl♡ 2
- 🤗Milos/slovak-gpt-j-405Mmodel· 40k dl· ♡ 240k dl♡ 2
- 🤗NbAiLab/nb-gpt-j-6Bmodel· 20 dl· ♡ 2120 dl♡ 21
- 🤗NovelAI/genji-jpmodel· 14 dl· ♡ 5214 dl♡ 52
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsRotary Position Embedding
