TL;DR
This paper introduces ZOO, a black-box attack method for deep neural networks that directly estimates gradients without training substitute models, achieving high attack success rates comparable to white-box attacks.
Contribution
The paper proposes a novel zeroth order optimization approach for black-box adversarial attacks that bypasses substitute model training, improving attack effectiveness and transferability.
Findings
ZOO attack matches white-box attack success rates on various datasets.
ZOO significantly outperforms existing black-box attacks based on substitute models.
The method is effective on MNIST, CIFAR10, and ImageNet.
Abstract
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
