Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Camilo Tamayo-Rousseau; Yunjia Zhao; Yiqun Zhang; Randall Balestriero

arXiv:2507.20453·cs.LG·September 9, 2025

Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Camilo Tamayo-Rousseau, Yunjia Zhao, Yiqun Zhang, Randall Balestriero

PDF

Open Access

TL;DR

This paper evaluates various self-attention mechanisms in Vision Transformers, demonstrating that Doubly Stochastic attention offers superior robustness to noise and spurious correlations across multiple datasets.

Contribution

It provides a comprehensive comparison of attention variants under data corruption, highlighting Doubly Stochastic attention as the most robust option.

Findings

01

Doubly Stochastic attention outperforms others by 0.1%-5.1% under corrupted data.

02

Robustness varies significantly across attention mechanisms.

03

Results guide better attention mechanism choices for noisy data environments.

Abstract

Self-attention mechanisms are foundational to Transformer architectures, supporting their impressive success in a wide range of tasks. While there are many self-attention variants, their robustness to noise and spurious correlations has not been well studied. This study evaluates Softmax, Sigmoid, Linear, Doubly Stochastic, and Cosine attention within Vision Transformers under different data corruption scenarios. Through testing across the CIFAR-10, CIFAR-100, and Imagenette datasets, we show that Doubly Stochastic attention is the most robust. It consistently outperformed the next best mechanism by $0.1% - 5.1%$ when training data, or both training and testing data, were corrupted. Our findings inform self-attention selection in contexts with imperfect data. The code used is available at https://github.com/ctamayor/NeurIPS-Robustness-ViT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning