Stealing Knowledge from Protected Deep Neural Networks Using Composite Unlabeled Data
Itay Mosafi, Eli David, Nathan S. Netanyahu

TL;DR
This paper introduces a novel composite image-based attack method that can effectively mimic and steal knowledge from protected deep neural networks without requiring access to their training data, architecture, or confidence scores.
Contribution
The authors propose a new composite image generation technique for black-box neural network stealing, demonstrating its effectiveness even without access to softmax outputs or training data.
Findings
The attack successfully mimics target networks without prior knowledge.
Stealing models remain undetectable by current watermarking defenses.
All tested networks are vulnerable to this composite image attack.
Abstract
As state-of-the-art deep neural networks are deployed at the core of more advanced Al-based products and services, the incentive for copying them (i.e., their intellectual properties) by rival adversaries is expected to increase considerably over time. The best way to extract or steal knowledge from such networks is by querying them using a large dataset of random samples and recording their output, followed by training a student network to mimic these outputs, without making any assumption about the original networks. The most effective way to protect against such a mimicking attack is to provide only the classification result, without confidence values associated with the softmax layer.In this paper, we present a novel method for generating composite images for attacking a mentor neural network using a student model. Our method assumes no information regarding the mentor's training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
