Conv-Basis: A New Paradigm for Efficient Attention Inference and   Gradient Computation in Transformers

Yingyu Liang; Heshan Liu; Zhenmei Shi; Zhao Song; Zhuoyan Xu; Junze; Yin

arXiv:2405.05219·cs.LG·October 17, 2024

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze, Yin

PDF

Open Access 1 Video

TL;DR

Conv-Basis introduces a convolution-based approximation for self-attention in transformers, significantly reducing computational complexity and enabling efficient processing of longer sequences in large language models.

Contribution

It develops a novel convolution basis system to approximate attention matrices, achieving near-linear time inference and training in transformers.

Findings

01

Achieves $O(knd \, \log n)$ inference time using FFT

02

Provides theoretical guarantees on runtime and approximation error

03

Demonstrates preliminary effectiveness in experiments

Abstract

The self-attention mechanism is the key to the success of transformers in recent Large Language Models (LLMs). However, the quadratic computational cost $O (n^{2})$ in the input sequence length $n$ is a notorious obstacle for further improvement and scalability in longer contexts. In this work, we leverage the convolution-like structure of attention matrices to develop an efficient approximation method for attention computation using convolution matrices. We propose a $conv$ basis system, analogous to the rank basis, and show that any lower triangular matrix can always be decomposed as a sum of structured convolution matrices in this basis. We then design a fast algorithm to approximate the attention matrix via a sum of such $k$ convolution matrices. This allows us to compute the attention {\it inference} via Fast Fourier Transforms (FFT) in $O (k n d lo g n)$ time, where $d$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers· underline

Taxonomy

TopicsAdvanced Memory and Neural Computing

MethodsConvolution