BViT: Broad Attention based Vision Transformer

Nannan Li; Yaran Chen; Weifan Li; Zixiang Ding; Dongbin Zhao

arXiv:2202.06268·cs.CV·June 12, 2023

BViT: Broad Attention based Vision Transformer

Nannan Li, Yaran Chen, Weifan Li, Zixiang Ding, Dongbin Zhao

PDF

Open Access 1 Repo

TL;DR

BViT introduces a novel broad attention mechanism that leverages multi-layer attention relationships without extra parameters, significantly improving vision transformer performance on image classification and recognition tasks.

Contribution

The paper proposes broad attention with broad connections and parameter-free attention to enhance vision transformers by utilizing multi-layer attention relationships.

Findings

01

Achieves state-of-the-art accuracy on ImageNet with fewer parameters.

02

Outperforms ViT on CIFAR10 and CIFAR100 with fewer parameters.

03

Improves performance of Swin Transformer and T2T-ViT by over 1%.

Abstract

Recent works have demonstrated that transformer can achieve promising performance in computer vision, by exploiting the relationship among image patches with self-attention. While they only consider the attention in a single feature layer, but ignore the complementarity of attention in different levels. In this paper, we propose the broad attention to improve the performance by incorporating the attention relationship of different layers for vision transformer, which is called BViT. The broad attention is implemented by broad connection and parameter-free attention. Broad connection of each transformer layer promotes the transmission and integration of information for BViT. Without introducing additional trainable parameters, parameter-free attention jointly focuses on the already available attention information in different layers for extracting useful information and building their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

koala719/bvit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · CCD and CMOS Imaging Sensors

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Convolution · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Stochastic Depth · Residual Connection