Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?

Blaine Hoak; Kunyang Li; Patrick McDaniel

arXiv:2502.12377·cs.CV·July 15, 2025

Alignment and Adversarial Robustness: Are More Human-Like Models More Secure?

Blaine Hoak, Kunyang Li, Patrick McDaniel

PDF

Open Access

TL;DR

This study empirically investigates whether models with human-like perception are more resistant to adversarial attacks, finding that certain alignment measures predict robustness, especially those related to texture and shape perception.

Contribution

It provides a large-scale analysis linking representational alignment with adversarial robustness, highlighting specific alignment benchmarks as strong predictors.

Findings

01

Alignment and robustness have a weak overall correlation.

02

Texture and shape alignment benchmarks strongly predict robustness.

03

Different forms of alignment influence model security differently.

Abstract

A small but growing body of work has shown that machine learning models which better align with human vision have also exhibited higher robustness to adversarial examples, raising the question: can human-like perception make models more secure? If true generally, such mechanisms would offer new avenues toward robustness. In this work, we conduct a large-scale empirical analysis to systematically investigate the relationship between representational alignment and adversarial robustness. We evaluate 114 models spanning diverse architectures and training paradigms, measuring their neural and behavioral alignment and engineering task performance across 105 benchmarks as well as their adversarial robustness via AutoAttack. Our findings reveal that while average alignment and robustness exhibit a weak overall correlation, specific alignment benchmarks serve as strong predictors of adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning