Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks
Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz

TL;DR
This paper introduces an active defense mechanism against DNN model stealing attacks that perturbs predictions to poison the attacker's training process, effectively reducing attack success across various datasets and models.
Contribution
It presents the first active perturbation-based defense against DNN stealing, outperforming passive defenses and maintaining high utility for legitimate users.
Findings
Effective against a wide range of attacks
Amplifies attacker's error rate up to 85 times
Maintains utility for benign users
Abstract
High-performance Deep Neural Networks (DNNs) are increasingly deployed in many real-world applications e.g., cloud prediction APIs. Recent advances in model functionality stealing attacks via black-box access (i.e., inputs in, predictions out) threaten the business model of such applications, which require a lot of time, money, and effort to develop. Existing defenses take a passive role against stealing attacks, such as by truncating predicted information. We find such passive defenses ineffective against DNN stealing attacks. In this paper, we propose the first defense which actively perturbs predictions targeted at poisoning the training objective of the attacker. We find our defense effective across a wide range of challenging datasets and DNN model stealing attacks, and additionally outperforms existing defenses. Our defense is the first that can withstand highly accurate model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Anomaly Detection Techniques and Applications
