Adversarial alignment: Breaking the trade-off between the strength of an   attack and its relevance to human perception

Drew Linsley; Pinyuan Feng; Thibaut Boissin; Alekh Karkada Ashok,; Thomas Fel; Stephanie Olaiya; Thomas Serre

arXiv:2306.03229·cs.CV·June 7, 2023·1 cites

Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception

Drew Linsley, Pinyuan Feng, Thibaut Boissin, Alekh Karkada Ashok,, Thomas Fel, Stephanie Olaiya, Thomas Serre

PDF

Open Access

TL;DR

This paper investigates how the robustness of deep neural networks to adversarial attacks has evolved with their increasing accuracy, revealing a trade-off between attack strength and human perceptual relevance, and proposes training methods to mitigate this issue.

Contribution

It introduces the neural harmonizer training routine that aligns model features with human perception, improving adversarial robustness and interpretability.

Findings

01

Larger DNNs induce more detectable but less human-aligned attacks.

02

Harmonized DNNs produce attacks that are both detectable and human-aligned.

03

Scaling models, data, and training routines can reduce adversarial sensitivity.

Abstract

Deep neural networks (DNNs) are known to have a fundamental sensitivity to adversarial attacks, perturbations of the input that are imperceptible to humans yet powerful enough to change the visual decision of a model. Adversarial attacks have long been considered the "Achilles' heel" of deep learning, which may eventually force a shift in modeling paradigms. Nevertheless, the formidable capabilities of modern large-scale DNNs have somewhat eclipsed these early concerns. Do adversarial attacks continue to pose a threat to DNNs? Here, we investigate how the robustness of DNNs to adversarial attacks has evolved as their accuracy on ImageNet has continued to improve. We measure adversarial robustness in two different ways: First, we measure the smallest adversarial attack needed to cause a model to change its object categorization decision. Second, we measure how aligned successful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Integrated Circuits and Semiconductor Failure Analysis

MethodsALIGN