SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo, Gao, Chunjing Xu, Tao Xiang, Li Zhang

TL;DR
This paper introduces SOFT, a novel softmax-free transformer that replaces softmax with a Gaussian kernel, enabling linear complexity self-attention and allowing longer token sequences for improved visual recognition performance.
Contribution
The paper proposes the first softmax-free transformer using Gaussian kernels and low-rank approximation, achieving linear complexity and better efficiency for vision transformers.
Findings
Significantly improves computational efficiency on ImageNet
Enables processing longer token sequences
Achieves superior accuracy-complexity trade-offs
Abstract
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSoftmax
