BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers
Chaodong Xiao, Zhengqiang Zhang, Lei Zhang

TL;DR
BinaryAttention introduces a 1-bit quantization method for attention in vision and diffusion transformers, significantly reducing computational cost while maintaining or improving accuracy through learnable biases and training techniques.
Contribution
The paper presents a novel 1-bit attention mechanism that preserves essential similarity relationships and accelerates transformer computations for vision tasks.
Findings
BinaryAttention is over 2x faster than FlashAttention2 on A100 GPUs.
It matches or exceeds full-precision attention accuracy in vision and diffusion transformer benchmarks.
The method effectively mitigates information loss through learnable biases and quantization-aware training.
Abstract
Transformers have achieved widespread and remarkable success, while the computational complexity of their attention modules remains a major bottleneck for vision tasks. Existing methods mainly employ 8-bit or 4-bit quantization to balance efficiency and accuracy. In this paper, with theoretical justification, we indicate that binarization of attention preserves the essential similarity relationships, and propose BinaryAttention, an effective method for fast and accurate 1-bit qk-attention. Specifically, we retain only the sign of queries and keys in computing the attention, and replace the floating dot products with bit-wise operations, significantly reducing the computational cost. We mitigate the inherent information loss under 1-bit quantization by incorporating a learnable bias, and enable end-to-end acceleration. To maintain the accuracy of attention, we adopt quantization-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
