LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy   Attentions

Ravindran Kannan; Chiranjib Bhattacharyya; Praneeth Kacham; David P.; Woodruff

arXiv:2410.05462·cs.LG·October 10, 2024

LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions

Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham, David P., Woodruff

PDF

Open Access 1 Video

TL;DR

LevAttention introduces a linear-time algorithm for identifying significant attention scores in transformers, enabling efficient long-context processing without data assumptions and suitable for streaming and parallel computation.

Contribution

The paper presents a novel universal set approach for heavy attention scores, leveraging randomized linear algebra, with applications to vision transformers and no data assumptions.

Findings

01

Efficient identification of large attention scores in linear time.

02

Universal set of key indices independent of n, applicable across data.

03

Empirical validation on vision transformers shows improved key selection.

Abstract

A central problem related to transformers can be stated as follows: given two $n \times d$ matrices $Q$ and $K$ , and a non-negative function $f$ , define the matrix $A$ as follows: (1) apply the function $f$ to each entry of the $n \times n$ matrix $Q K^{T}$ , and then (2) normalize each of the row sums of $A$ to be equal to $1$ . The matrix $A$ can be computed in $O (n^{2} d)$ time assuming $f$ can be applied to a number in constant time, but the quadratic dependence on $n$ is prohibitive in applications where it corresponds to long context lengths. For a large class of functions $f$ , we show how to find all the ``large attention scores", i.e., entries of $A$ which are at least a positive value $ε$ , in time with linear dependence on $n$ (i.e., $n \cdot poly (d / ε)$ ) for a positive parameter $ε > 0$ . Our class of functions include all functions $f$ of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions· slideslive

Taxonomy

TopicsParallel Computing and Optimization Techniques · EEG and Brain-Computer Interfaces · Low-power high-performance VLSI design

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training