FlashBias: Fast Computation of Attention with Bias

Haixu Wu; Minghao Guo; Yuezhou Ma; Yuanxu Sun; Jianmin Wang; Wojciech Matusik; Mingsheng Long

arXiv:2505.12044·cs.LG·October 27, 2025

FlashBias: Fast Computation of Attention with Bias

Haixu Wu, Minghao Guo, Yuezhou Ma, Yuanxu Sun, Jianmin Wang, Wojciech Matusik, Mingsheng Long

PDF

Open Access 1 Video

TL;DR

FlashBias introduces a low-rank based method to accelerate attention with bias in neural networks, significantly improving efficiency without sacrificing accuracy across vision, language, and scientific models.

Contribution

It provides a novel low-rank compression approach for fast exact or approximate computation of biased attention, addressing a key efficiency bottleneck.

Findings

01

Achieves 1.5× speedup in AlphaFold 3 with no accuracy loss

02

Over 2× speedup in vision and language models with maintained accuracy

03

Theoretically links optimal efficiency to the rank of attention weight matrices

Abstract

Attention with bias, which extends standard attention by introducing prior knowledge as an additive bias matrix to the query-key scores, has been widely deployed in vision, language, protein-folding and other advanced scientific models, underscoring its status as a key evolution of this foundational module. However, introducing bias terms creates a severe efficiency bottleneck in attention computation. It disrupts the tightly fused memory-compute pipeline that underlies the speed of accelerators like FlashAttention, thereby stripping away most of their performance gains and leaving biased attention computationally expensive. Surprisingly, despite its common usage, targeted efficiency optimization for attention with bias remains absent, which seriously hinders its application in complex tasks. Diving into the computation of FlashAttention, we prove that its optimal efficiency is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FlashBias: Fast Computation of Attention with Bias· slideslive

Taxonomy

TopicsBig Data and Digital Economy · Graph Theory and Algorithms · Stochastic Gradient Optimization Techniques