The Limitations of Adversarial Training and the Blind-Spot Attack

Huan Zhang; Hongge Chen; Zhao Song; Duane Boning; Inderjit S. Dhillon,; Cho-Jui Hsieh

arXiv:1901.04684·stat.ML·January 28, 2019·62 cites

The Limitations of Adversarial Training and the Blind-Spot Attack

Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon,, Cho-Jui Hsieh

PDF

Open Access

TL;DR

This paper reveals that adversarial training's robustness is limited by the data manifold and introduces the 'blind-spot attack' targeting low-density regions, exposing vulnerabilities in both empirical and provable defenses, especially in high-dimensional datasets.

Contribution

The paper identifies the existence of blind-spots in adversarial training and provable defenses, demonstrating their impact on robustness in high-dimensional data.

Findings

01

Blind-spots can be easily found in MNIST by simple transformations.

02

Blind-spots pose significant challenges for defending high-dimensional datasets like CIFAR and ImageNet.

03

Provable defenses are also susceptible to blind-spot attacks due to limited robustness certificates.

Abstract

The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network. Test examples that are relatively far away from this manifold are more likely to be vulnerable to adversarial attacks. Consequentially, an adversarial training based defense is susceptible to a new class of attacks, the "blind-spot attack", where the input images reside in "blind-spots" (low density regions) of the empirical distribution of training data but is still on the ground-truth data manifold. For MNIST, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Anomaly Detection Techniques and Applications