Luna: Linear Unified Nested Attention
Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao, Ma, Luke Zettlemoyer

TL;DR
Luna introduces a linear nested attention mechanism that reduces the computational complexity of Transformers from quadratic to linear, enabling efficient long-sequence modeling without sacrificing performance.
Contribution
Luna proposes a novel linear unified nested attention mechanism that approximates softmax attention with nested linear functions, improving scalability for long sequences.
Findings
Achieves linear time and space complexity in attention operations.
Performs competitively or better on sequence modeling benchmarks.
Demonstrates efficiency and effectiveness across multiple tasks.
Abstract
The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences. In this paper, we propose Luna, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear (as opposed to quadratic) time and space complexity. Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length. Then, the packed sequence is unpacked using the second attention function. As compared to a more traditional attention mechanism, Luna introduces an additional sequence with a fixed length as input and an additional corresponding output, which allows Luna to perform attention operation linearly, while also storing adequate contextual information. We perform extensive evaluations on three benchmarks of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications
MethodsSoftmax
