Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?
Blaine Hoak, Kunyang Li, Patrick McDaniel

TL;DR
This study empirically investigates whether models with human-like perception are more resistant to adversarial attacks, finding that certain alignment measures predict robustness, especially those related to texture and shape perception.
Contribution
It provides a large-scale analysis linking representational alignment with adversarial robustness, highlighting specific alignment benchmarks as strong predictors.
Findings
Alignment and robustness have a weak overall correlation.
Texture and shape alignment benchmarks strongly predict robustness.
Different forms of alignment influence model security differently.
Abstract
A small but growing body of work has shown that machine learning models which better align with human vision have also exhibited higher robustness to adversarial examples, raising the question: can human-like perception make models more secure? If true generally, such mechanisms would offer new avenues toward robustness. In this work, we conduct a large-scale empirical analysis to systematically investigate the relationship between representational alignment and adversarial robustness. We evaluate 114 models spanning diverse architectures and training paradigms, measuring their neural and behavioral alignment and engineering task performance across 105 benchmarks as well as their adversarial robustness via AutoAttack. Our findings reveal that while average alignment and robustness exhibit a weak overall correlation, specific alignment benchmarks serve as strong predictors of adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
