High Accuracy and High Fidelity Extraction of Neural Networks

Matthew Jagielski; Nicholas Carlini; David Berthelot; Alex Kurakin,; Nicolas Papernot

arXiv:1909.01838·cs.LG·March 5, 2020·55 cites

High Accuracy and High Fidelity Extraction of Neural Networks

Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin,, Nicolas Papernot

PDF

Open Access

TL;DR

This paper introduces a practical method for extracting neural network models with high accuracy and fidelity, highlighting the limitations of existing learning-based attacks and demonstrating the feasibility of direct, functionally-equivalent extraction on real-world systems.

Contribution

It presents the first practical attack for direct, functionally-equivalent extraction of neural network weights, surpassing previous limitations and demonstrating real-world applicability.

Findings

01

High-accuracy extraction using a learning-based approach.

02

Inherent limitations prevent perfect fidelity in learning-based methods.

03

Practical direct extraction attack demonstrated on large-scale image classifier.

Abstract

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack exploiting the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model---i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications