FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks

Maksim Zubkov; Daniil Gavrilov

arXiv:2202.11364·cs.LG·February 24, 2022

FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks

Maksim Zubkov, Daniil Gavrilov

PDF

Open Access 1 Repo

TL;DR

This paper introduces FastRPB, a scalable and efficient relative positional encoding method for long sequence tasks in transformers, improving accuracy and computational efficiency over existing linear transformer models.

Contribution

The paper proposes FastRPB, a novel positional encoding technique that enhances linear transformers' performance and scalability for long sequences.

Findings

01

FastRPB achieves O(N log(N)) complexity with O(N) memory.

02

Improved linear transformers with SIKF outperform previous models.

03

FastRPB can be integrated with various self-attention mechanisms.

Abstract

Transformers achieve remarkable performance in various domains, including NLP, CV, audio processing, and graph analysis. However, they do not scale well on long sequence tasks due to their quadratic complexity w.r.t. the inputs length. Linear Transformers were proposed to address this limitation. However, these models have shown weaker performance on the long sequence tasks comparing to the original one. In this paper, we explore Linear Transformer models, rethinking their two core components. Firstly, we improved Linear Transformer with Shift-Invariant Kernel Function SIKF, which achieve higher accuracy without loss in speed. Secondly, we introduce FastRPB which stands for Fast Relative Positional Bias, which efficiently adds positional information to self-attention using Fast Fourier Transformation. FastRPB is independent of the self-attention mechanism and can be combined with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maximzubkov/LinBERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Neural Networks and Applications · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Adam · Label Smoothing · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer