The Limitations of Adversarial Training and the Blind-Spot Attack
Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon,, Cho-Jui Hsieh

TL;DR
This paper reveals that adversarial training's robustness is limited by the data manifold and introduces the 'blind-spot attack' targeting low-density regions, exposing vulnerabilities in both empirical and provable defenses, especially in high-dimensional datasets.
Contribution
The paper identifies the existence of blind-spots in adversarial training and provable defenses, demonstrating their impact on robustness in high-dimensional data.
Findings
Blind-spots can be easily found in MNIST by simple transformations.
Blind-spots pose significant challenges for defending high-dimensional datasets like CIFAR and ImageNet.
Provable defenses are also susceptible to blind-spot attacks due to limited robustness certificates.
Abstract
The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network. Test examples that are relatively far away from this manifold are more likely to be vulnerable to adversarial attacks. Consequentially, an adversarial training based defense is susceptible to a new class of attacks, the "blind-spot attack", where the input images reside in "blind-spots" (low density regions) of the empirical distribution of training data but is still on the ground-truth data manifold. For MNIST, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Anomaly Detection Techniques and Applications
