Revisiting Linformer with a modified self-attention with linear   complexity

Madhusudan Verma

arXiv:2101.10277·cs.LG·January 26, 2021

Revisiting Linformer with a modified self-attention with linear complexity

Madhusudan Verma

PDF

TL;DR

This paper proposes a new self-attention method with linear complexity that is independent of hyperparameters, improving efficiency for long sequences in NLP, images, and audio tasks.

Contribution

A novel self-attention approach with linear time and space complexity that does not depend on the projection mapping dimension, enhancing scalability.

Findings

01

Reduces computational cost for long sequences

02

Applicable to images and audio data

03

Maintains performance without hyperparameter tuning

Abstract

Although Transformer models such as Google's BERT and OpenAI's GPT-3 are successful in many natural language processing tasks, training and deploying these models are costly and inefficient.Even if pre-trained models are used, deploying these models still remained a challenge due to their large size. Apart from deployment, these models take higher time during inference restricting user-friendliness. The main bottleneck is self-attention which uses quadratic time and space with respect to the sequence length. In order to reduce the quadratic time complexity of the self-attention mechanism, Linformer by Facebook's AI research team was introduced where they showed that the self-attention mechanism can be approximated by a low-rank matrix and exploiting this finding, a new method for self-attention with linear time and space complexity was proposed by them. In the Linformer, the time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAbsolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Linear Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Linear Attention · Layer Normalization · WordPiece · Residual Connection · Label Smoothing