Luna: Linear Unified Nested Attention

Xuezhe Ma; Xiang Kong; Sinong Wang; Chunting Zhou; Jonathan May; Hao; Ma; Luke Zettlemoyer

arXiv:2106.01540·cs.LG·November 4, 2021·49 cites

Luna: Linear Unified Nested Attention

Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao, Ma, Luke Zettlemoyer

PDF

Open Access 2 Repos 1 Video

TL;DR

Luna introduces a linear nested attention mechanism that reduces the computational complexity of Transformers from quadratic to linear, enabling efficient long-sequence modeling without sacrificing performance.

Contribution

Luna proposes a novel linear unified nested attention mechanism that approximates softmax attention with nested linear functions, improving scalability for long sequences.

Findings

01

Achieves linear time and space complexity in attention operations.

02

Performs competitively or better on sequence modeling benchmarks.

03

Demonstrates efficiency and effectiveness across multiple tasks.

Abstract

The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences. In this paper, we propose Luna, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear (as opposed to quadratic) time and space complexity. Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length. Then, the packed sequence is unpacked using the second attention function. As compared to a more traditional attention mechanism, Luna introduces an additional sequence with a fixed length as input and an additional corresponding output, which allows Luna to perform attention operation linearly, while also storing adequate contextual information. We perform extensive evaluations on three benchmarks of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Luna: Linear Unified Nested Attention· slideslive

Taxonomy

TopicsTopic Modeling · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications

MethodsSoftmax