Are Transformers More Robust Than CNNs?
Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie

TL;DR
This paper provides a fair comparison between Transformers and CNNs in visual recognition, revealing that CNNs can be as robust as Transformers with proper training, challenging previous beliefs about their robustness advantages.
Contribution
It offers the first fair, in-depth robustness comparison between Transformers and CNNs, showing CNNs can match Transformers' robustness with appropriate training methods.
Findings
CNNs can be as robust as Transformers with proper training
Pre-training on large datasets is not essential for Transformers' performance
Transformers' self-attention architecture benefits generalization
Abstract
Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations. With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
