Practical Black-Box Attacks against Machine Learning
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh, Jha, Z. Berkay Celik, Ananthram Swami

TL;DR
This paper demonstrates a practical black-box attack on remote deep neural networks by training local substitute models solely based on output labels, achieving high misclassification rates without internal model knowledge.
Contribution
It introduces the first real-world black-box attack method that requires only output labels, successfully attacking commercial ML APIs and bypassing existing defenses.
Findings
84.24% success rate on MetaMind DNN
96.19% success rate on Amazon models
88.94% success rate on Google models
Abstract
Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications
