BiFormer: Vision Transformer with Bi-Level Routing Attention

Lei Zhu; Xinjiang Wang; Zhanghan Ke; Wayne Zhang; Rynson; Lau

arXiv:2303.08810·cs.CV·March 16, 2023·66 cites

BiFormer: Vision Transformer with Bi-Level Routing Attention

Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson, Lau

PDF

Open Access 3 Repos 1 Models

TL;DR

BiFormer introduces a dynamic, content-aware sparse attention mechanism with bi-level routing in vision transformers, significantly reducing computation and memory costs while maintaining high performance across vision tasks.

Contribution

It proposes a novel bi-level routing attention method that enables flexible, content-aware sparse attention in vision transformers, improving efficiency and effectiveness.

Findings

01

Reduces computation and memory usage in vision transformers.

02

Achieves competitive performance on image classification, detection, and segmentation.

03

Demonstrates effectiveness of bi-level routing attention across multiple vision tasks.

Abstract

As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. Specifically, for a query, irrelevant key-value pairs are first filtered out at a coarse region level, and then fine-grained token-to-token attention is applied in the union of remaining candidate regions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
birder-project/biformer_s_il-all
model· 17 dl
17 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning