BOAT: Bilateral Local Attention Vision Transformer

Tan Yu; Gangming Zhao; Ping Li; Yizhou Yu

arXiv:2201.13027·cs.CV·October 20, 2022·22 cites

BOAT: Bilateral Local Attention Vision Transformer

Tan Yu, Gangming Zhao, Ping Li, Yizhou Yu

PDF

Open Access 1 Repo

TL;DR

BOAT introduces a novel vision transformer that combines local attention in both image space and feature space, enhancing the ability to capture distant relationships while maintaining efficiency.

Contribution

The paper proposes a bilateral local attention mechanism that integrates feature-space and image-space attention, improving the modeling of long-range dependencies in vision transformers.

Findings

01

BOAT-CSWin outperforms existing state-of-the-art models on benchmark datasets.

02

The bilateral attention mechanism improves capturing distant patch relationships.

03

Extensive experiments validate the effectiveness of the proposed method.

Abstract

Vision Transformers achieved outstanding performance in many computer vision tasks. Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is computationally expensive when the number of patches is large. To improve efficiency, recent Vision Transformers adopt local self-attention mechanisms, where self-attention is computed within local windows. Despite the fact that window-based local self-attention significantly boosts efficiency, it fails to capture the relationships between distant but similar patches in the image plane. To overcome this limitation of image-space local attention, in this paper, we further exploit the locality of patches in the feature space. We group the patches into multiple clusters using their features, and self-attention is computed within every cluster. Such feature-space local attention effectively captures the connections between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahaoyuHKU/pytorch-boat
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dropout · Position-Wise Feed-Forward Layer