High Accuracy and High Fidelity Extraction of Neural Networks
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin,, Nicolas Papernot

TL;DR
This paper introduces a practical method for extracting neural network models with high accuracy and fidelity, highlighting the limitations of existing learning-based attacks and demonstrating the feasibility of direct, functionally-equivalent extraction on real-world systems.
Contribution
It presents the first practical attack for direct, functionally-equivalent extraction of neural network weights, surpassing previous limitations and demonstrating real-world applicability.
Findings
High-accuracy extraction using a learning-based approach.
Inherent limitations prevent perfect fidelity in learning-based methods.
Practical direct extraction attack demonstrated on large-scale image classifier.
Abstract
In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack exploiting the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model---i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications
