Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze, Yin

TL;DR
Conv-Basis introduces a convolution-based approximation for self-attention in transformers, significantly reducing computational complexity and enabling efficient processing of longer sequences in large language models.
Contribution
It develops a novel convolution basis system to approximate attention matrices, achieving near-linear time inference and training in transformers.
Findings
Achieves $O(knd \, \log n)$ inference time using FFT
Provides theoretical guarantees on runtime and approximation error
Demonstrates preliminary effectiveness in experiments
Abstract
The self-attention mechanism is the key to the success of transformers in recent Large Language Models (LLMs). However, the quadratic computational cost in the input sequence length is a notorious obstacle for further improvement and scalability in longer contexts. In this work, we leverage the convolution-like structure of attention matrices to develop an efficient approximation method for attention computation using convolution matrices. We propose a basis system, analogous to the rank basis, and show that any lower triangular matrix can always be decomposed as a sum of structured convolution matrices in this basis. We then design a fast algorithm to approximate the attention matrix via a sum of such convolution matrices. This allows us to compute the attention {\it inference} via Fast Fourier Transforms (FFT) in time, where is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing
MethodsConvolution
