Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz, Soomin Ham, Chaoning Zhang, Adil Karjauv, In So Kweon

TL;DR
This paper empirically compares the adversarial robustness of Vision Transformers and MLP-Mixers to CNNs, finding that ViT is generally more robust and that frequency features influence robustness, with MLP-Mixer being highly vulnerable to universal attacks.
Contribution
It provides the first comprehensive empirical evaluation of adversarial robustness across ViT, MLP-Mixer, and CNN architectures, highlighting their differences and underlying factors.
Findings
ViT is more robust than CNNs against adversarial attacks.
MLP-Mixer is extremely vulnerable to universal adversarial perturbations.
Low-frequency features contribute to ViT's robustness.
Abstract
Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Integrated Circuits and Semiconductor Failure Analysis
MethodsAttention Is All You Need · Linear Layer · Average Pooling · Global Average Pooling · Refunds@Expedia|||How do I get a full refund from Expedia? · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · MLP-Mixer · Dropout
