FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks
Maksim Zubkov, Daniil Gavrilov

TL;DR
This paper introduces FastRPB, a scalable and efficient relative positional encoding method for long sequence tasks in transformers, improving accuracy and computational efficiency over existing linear transformer models.
Contribution
The paper proposes FastRPB, a novel positional encoding technique that enhances linear transformers' performance and scalability for long sequences.
Findings
FastRPB achieves O(N log(N)) complexity with O(N) memory.
Improved linear transformers with SIKF outperform previous models.
FastRPB can be integrated with various self-attention mechanisms.
Abstract
Transformers achieve remarkable performance in various domains, including NLP, CV, audio processing, and graph analysis. However, they do not scale well on long sequence tasks due to their quadratic complexity w.r.t. the inputs length. Linear Transformers were proposed to address this limitation. However, these models have shown weaker performance on the long sequence tasks comparing to the original one. In this paper, we explore Linear Transformer models, rethinking their two core components. Firstly, we improved Linear Transformer with Shift-Invariant Kernel Function SIKF, which achieve higher accuracy without loss in speed. Secondly, we introduce FastRPB which stands for Fast Relative Positional Bias, which efficiently adds positional information to self-attention using Fast Fourier Transformation. FastRPB is independent of the self-attention mechanism and can be combined with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neural Networks and Applications · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Adam · Label Smoothing · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer
