Blending Anti-Aliasing into Vision Transformer

Shengju Qian; Hao Shao; Yi Zhu; Mu Li; Jiaya Jia

arXiv:2110.15156·cs.CV·October 29, 2021

Blending Anti-Aliasing into Vision Transformer

Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

PDF

Open Access 1 Video

TL;DR

This paper identifies aliasing artifacts in vision transformers caused by patch-wise tokenization and introduces a plug-and-play Anti-Aliasing Module (ARM) that improves performance, robustness, and data efficiency across multiple tasks.

Contribution

The paper presents a novel anti-aliasing module for vision transformers, addressing a previously uncharted problem and enhancing their performance and robustness.

Findings

01

ARM reduces aliasing artifacts effectively.

02

Improves accuracy and robustness across multiple vision transformer models.

03

Enhances data efficiency in vision transformer applications.

Abstract

The transformer architectures, based on self-attention mechanism and convolution-free design, recently found superior performance and booming applications in computer vision. However, the discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps, arising the traditional problem of aliasing for vision transformers. Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions. Recent researches have found that modern convolution networks still suffer from this phenomenon. In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties. Specifically, we propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue. We investigate the effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Blending Anti-Aliasing into Vision Transformer· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Residual Connection · Vision Transformer · Convolution