BiFormer: Vision Transformer with Bi-Level Routing Attention
Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson, Lau

TL;DR
BiFormer introduces a dynamic, content-aware sparse attention mechanism with bi-level routing in vision transformers, significantly reducing computation and memory costs while maintaining high performance across vision tasks.
Contribution
It proposes a novel bi-level routing attention method that enables flexible, content-aware sparse attention in vision transformers, improving efficiency and effectiveness.
Findings
Reduces computation and memory usage in vision transformers.
Achieves competitive performance on image classification, detection, and segmentation.
Demonstrates effectiveness of bi-level routing attention across multiple vision tasks.
Abstract
As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. Specifically, for a query, irrelevant key-value pairs are first filtered out at a coarse region level, and then fine-grained token-to-token attention is applied in the union of remaining candidate regions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
