Deep Transformers Thirst for Comprehensive-Frequency Data

Rui Xia; Chao Xue; Boyu Deng; Fang Wang; Jingchao Wang

arXiv:2203.07116·cs.CV·November 18, 2022

Deep Transformers Thirst for Comprehensive-Frequency Data

Rui Xia, Chao Xue, Boyu Deng, Fang Wang, Jingchao Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces EIT, a pyramid-free transformer model that efficiently incorporates inductive bias, increasing high-frequency data attention and achieving state-of-the-art performance on ImageNet-1K with a simpler structure.

Contribution

The paper proposes EIT, a novel pyramid-free transformer that effectively introduces inductive bias, enhancing high-frequency data attention and outperforming existing models.

Findings

01

EIT achieves state-of-the-art results on ImageNet-1K.

02

Introducing IB increases high-frequency data attention.

03

Pyramid-free structure simplifies model design.

Abstract

Current researches indicate that inductive bias (IB) can improve Vision Transformer (ViT) performance. However, they introduce a pyramid structure concurrently to counteract the incremental FLOPs and parameters caused by introducing IB. This structure destroys the unification of computer vision and natural language processing (NLP) and complicates the model. We study an NLP model called LSRA, which introduces IB with a pyramid-free structure. We analyze why it outperforms ViT, discovering that introducing IB increases the share of high-frequency data in each layer, giving "attention" to more information. As a result, the heads notice more diverse information, showing better performance. To further explore the potential of transformers, we propose EIT, which Efficiently introduces IB to ViT with a novel decreasing convolutional structure under a pyramid-free structure. EIT achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mrhaipi/eit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Infrared Target Detection Methodologies · CCD and CMOS Imaging Sensors

MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Vision Transformer · 1x1 Convolution · Dropout · Dense Connections · Average Pooling · Residual Connection