LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham, David P., Woodruff

TL;DR
LevAttention introduces a linear-time algorithm for identifying significant attention scores in transformers, enabling efficient long-context processing without data assumptions and suitable for streaming and parallel computation.
Contribution
The paper presents a novel universal set approach for heavy attention scores, leveraging randomized linear algebra, with applications to vision transformers and no data assumptions.
Findings
Efficient identification of large attention scores in linear time.
Universal set of key indices independent of n, applicable across data.
Empirical validation on vision transformers shows improved key selection.
Abstract
A central problem related to transformers can be stated as follows: given two matrices and , and a non-negative function , define the matrix as follows: (1) apply the function to each entry of the matrix , and then (2) normalize each of the row sums of to be equal to . The matrix can be computed in time assuming can be applied to a number in constant time, but the quadratic dependence on is prohibitive in applications where it corresponds to long context lengths. For a large class of functions , we show how to find all the ``large attention scores", i.e., entries of which are at least a positive value , in time with linear dependence on (i.e., ) for a positive parameter . Our class of functions include all functions of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques · EEG and Brain-Computer Interfaces · Low-power high-performance VLSI design
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
