Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang

TL;DR
This paper investigates the robustness of Vision Transformers (ViTs) to common corruptions, revealing that simple architectural modifications and advanced augmentation strategies can significantly enhance their resilience.
Contribution
It is the first comprehensive study to analyze how architectural choices and data augmentation affect ViT robustness, proposing a novel dynamic augmentation method for improved performance.
Findings
Overlapping patch embedding and convolutional FFN improve robustness.
Adversarial noise training outperforms Fourier-domain augmentation.
Proposed dynamic augmentation achieves state-of-the-art robustness.
Abstract
With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy. However, it remains unclear how their unique architecture impact robustness towards common corruptions. In this paper, we make the first attempt to probe into the robustness gap among ViT variants and explore underlying designs that are essential for robustness. Through an extensive and rigorous benchmarking, we demonstrate that simple architecture designs such as overlapping patch embedding and convolutional feed-forward network (FFN) can promote the robustness of ViTs. Moreover, since training ViTs relies heavily on data augmentation, whether previous CNN-based augmentation strategies that are targeted at robustness purposes can still be useful is worth investigating. We explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · CCD and CMOS Imaging Sensors
