Vision Transformers are Circulant Attention Learners
Dongchen Han, Tianyu Li, Ziyi Wang, Gao Huang

TL;DR
This paper introduces Circulant Attention, a novel efficient attention mechanism for vision Transformers that exploits the inherent block circulant structure of self-attention matrices to reduce computational complexity from quadratic to near-linear.
Contribution
The paper proposes modeling self-attention as a block circulant matrix, enabling fast computation with $ ext{O}(N ext{log}N)$ complexity while maintaining model capacity.
Findings
Achieves $ ext{O}(N ext{log}N)$ complexity in attention computation.
Maintains comparable performance to vanilla self-attention on visual tasks.
Provides a practical alternative for high-resolution vision Transformer models.
Abstract
The self-attention mechanism has been a key factor in the advancement of vision Transformers. However, its quadratic complexity imposes a heavy computational burden in high-resolution scenarios, restricting the practical application. Previous methods attempt to mitigate this issue by introducing handcrafted patterns such as locality or sparsity, which inevitably compromise model capacity. In this paper, we present a novel attention paradigm termed \textbf{Circulant Attention} by exploiting the inherent efficient pattern of self-attention. Specifically, we first identify that the self-attention matrix in vision Transformers often approximates the Block Circulant matrix with Circulant Blocks (BCCB), a kind of structured matrix whose multiplication with other matrices can be performed in time. Leveraging this interesting pattern, we explicitly model the attention map…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Ferroelectric and Negative Capacitance Devices
