Deep Transformers Thirst for Comprehensive-Frequency Data
Rui Xia, Chao Xue, Boyu Deng, Fang Wang, Jingchao Wang

TL;DR
This paper introduces EIT, a pyramid-free transformer model that efficiently incorporates inductive bias, increasing high-frequency data attention and achieving state-of-the-art performance on ImageNet-1K with a simpler structure.
Contribution
The paper proposes EIT, a novel pyramid-free transformer that effectively introduces inductive bias, enhancing high-frequency data attention and outperforming existing models.
Findings
EIT achieves state-of-the-art results on ImageNet-1K.
Introducing IB increases high-frequency data attention.
Pyramid-free structure simplifies model design.
Abstract
Current researches indicate that inductive bias (IB) can improve Vision Transformer (ViT) performance. However, they introduce a pyramid structure concurrently to counteract the incremental FLOPs and parameters caused by introducing IB. This structure destroys the unification of computer vision and natural language processing (NLP) and complicates the model. We study an NLP model called LSRA, which introduces IB with a pyramid-free structure. We analyze why it outperforms ViT, discovering that introducing IB increases the share of high-frequency data in each layer, giving "attention" to more information. As a result, the heads notice more diverse information, showing better performance. To further explore the potential of transformers, we propose EIT, which Efficiently introduces IB to ViT with a novel decreasing convolutional structure under a pyramid-free structure. EIT achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Infrared Target Detection Methodologies · CCD and CMOS Imaging Sensors
MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Vision Transformer · 1x1 Convolution · Dropout · Dense Connections · Average Pooling · Residual Connection
