Beyond Labeling Oracles: What does it mean to steal ML models?
Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot

TL;DR
This paper critically examines model extraction attacks, revealing that prior knowledge of data distribution is more crucial than attack strategy, and challenges existing evaluation methods of attack success.
Contribution
It provides a comprehensive analysis of factors influencing model extraction success, emphasizing the importance of prior data knowledge over attack policies.
Findings
Prior knowledge of in-distribution data is the dominant factor in attack success.
Current evaluation methods misinterpret the effectiveness of model extraction attacks.
Assumptions about cost savings in data acquisition are often invalid in practice.
Abstract
Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. Machine Learning (ML) models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extraction commonly claims or presumes that the attacker is able to save on both data acquisition and labeling costs. We thoroughly evaluate this assumption and find that the attacker often does not. This is because current attacks implicitly rely on the adversary being able to sample from the victim model's data distribution. We thoroughly research factors influencing the success of model extraction. We discover that prior knowledge of the attacker, i.e., access to in-distribution data, dominates…
Peer Reviews
Decision·Submitted to ICLR 2024
The literature on this topic is scattered, and it is very difficult to have a clear assessment of the real-world impact of model extraction attacks, as the considered settings can be very different in their assumptions (knowledge of model architecture, training data distribution, preprocessing steps, output access, application domain, etc.). Often, even comparing different attacks and defenses is not straightforward. This work has the merit of trying to address some of these issues.
Although the considerations and the findings of the paper are very interesting, its contribution seems limited by the considered case studies, whereas the conclusions drawn by the authors are supposed to be generally applied. Model stealing attacks can be performed with different purposes and settings, but in this work, the analyzed ones and experimental evaluation include a very tiny set of them: for instance, the attacker might easily obtain some data from the training distribution (or simila
- Studying model extraction attacks is a crucial avenue in the study of adversarial attacks, given the resource-intensive nature of training ML/DL models. - The paper is easy to follow.
- While the authors draw conclusions from a diverse set of experiments, there appears to be a need for a principled approach to assess model extraction attacks. It would be valuable if the authors could provide clear and concise definitions that offer a unified perspective on all the attacks and defenses discussed in their paper. Currently, the experimental findings, while intuitive, seem somewhat fragmented and lack a well-organized presentation. - Similar challenges are evident in the comparis
- Trendy topic - New perspective for ME attacks
- More evaluation is needed - Presentation can be improved - Lack of certain theoretical support
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Security and Verification in Computing · Adversarial Robustness in Machine Learning
