Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception
Drew Linsley, Pinyuan Feng, Thibaut Boissin, Alekh Karkada Ashok,, Thomas Fel, Stephanie Olaiya, Thomas Serre

TL;DR
This paper investigates how the robustness of deep neural networks to adversarial attacks has evolved with their increasing accuracy, revealing a trade-off between attack strength and human perceptual relevance, and proposes training methods to mitigate this issue.
Contribution
It introduces the neural harmonizer training routine that aligns model features with human perception, improving adversarial robustness and interpretability.
Findings
Larger DNNs induce more detectable but less human-aligned attacks.
Harmonized DNNs produce attacks that are both detectable and human-aligned.
Scaling models, data, and training routines can reduce adversarial sensitivity.
Abstract
Deep neural networks (DNNs) are known to have a fundamental sensitivity to adversarial attacks, perturbations of the input that are imperceptible to humans yet powerful enough to change the visual decision of a model. Adversarial attacks have long been considered the "Achilles' heel" of deep learning, which may eventually force a shift in modeling paradigms. Nevertheless, the formidable capabilities of modern large-scale DNNs have somewhat eclipsed these early concerns. Do adversarial attacks continue to pose a threat to DNNs? Here, we investigate how the robustness of DNNs to adversarial attacks has evolved as their accuracy on ImageNet has continued to improve. We measure adversarial robustness in two different ways: First, we measure the smallest adversarial attack needed to cause a model to change its object categorization decision. Second, we measure how aligned successful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Integrated Circuits and Semiconductor Failure Analysis
MethodsALIGN
