BViT: Broad Attention based Vision Transformer
Nannan Li, Yaran Chen, Weifan Li, Zixiang Ding, Dongbin Zhao

TL;DR
BViT introduces a novel broad attention mechanism that leverages multi-layer attention relationships without extra parameters, significantly improving vision transformer performance on image classification and recognition tasks.
Contribution
The paper proposes broad attention with broad connections and parameter-free attention to enhance vision transformers by utilizing multi-layer attention relationships.
Findings
Achieves state-of-the-art accuracy on ImageNet with fewer parameters.
Outperforms ViT on CIFAR10 and CIFAR100 with fewer parameters.
Improves performance of Swin Transformer and T2T-ViT by over 1%.
Abstract
Recent works have demonstrated that transformer can achieve promising performance in computer vision, by exploiting the relationship among image patches with self-attention. While they only consider the attention in a single feature layer, but ignore the complementarity of attention in different levels. In this paper, we propose the broad attention to improve the performance by incorporating the attention relationship of different layers for vision transformer, which is called BViT. The broad attention is implemented by broad connection and parameter-free attention. Broad connection of each transformer layer promotes the transmission and integration of information for BViT. Without introducing additional trainable parameters, parameter-free attention jointly focuses on the already available attention information in different layers for extracting useful information and building their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Convolution · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Stochastic Depth · Residual Connection
