Nystr\"omformer: A Nystr\"om-Based Algorithm for Approximating   Self-Attention

Yunyang Xiong; Zhanpeng Zeng; Rudrasis Chakraborty; Mingxing Tan,; Glenn Fung; Yin Li; Vikas Singh

arXiv:2102.03902·cs.CL·April 2, 2021·29 cites

Nystr\"omformer: A Nystr\"om-Based Algorithm for Approximating Self-Attention

Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan,, Glenn Fung, Yin Li, Vikas Singh

PDF

Open Access 5 Repos 2 Models 1 Video

TL;DR

Nystr"omformer introduces a scalable self-attention approximation using the Nystr"om method, enabling efficient processing of long sequences in transformers while maintaining competitive performance on standard NLP benchmarks.

Contribution

It adapts the Nystr"om method to approximate self-attention with linear complexity, allowing transformers to handle longer sequences efficiently.

Findings

01

Performs comparably or better on GLUE and IMDB tasks.

02

Outperforms other efficient self-attention methods on Long Range Arena.

03

Achieves $O(n)$ complexity in self-attention approximation.

Abstract

Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the input sequence length has limited its application to longer sequences -- a topic being actively studied in the community. To address this limitation, we propose Nystr\"{o}mformer -- a model that exhibits favorable scalability as a function of sequence length. Our idea is based on adapting the Nystr\"{o}m method to approximate standard self-attention with $O (n)$ complexity. The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens. We perform evaluations on multiple downstream tasks on the GLUE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (AI Paper Explained)· youtube

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Nyströmformer · Attention Is All You Need · Dense Connections · Byte Pair Encoding · Softmax · Dropout · Label Smoothing