Careful What You Wish For: on the Extraction of Adversarially Trained Models
Kacem Khaled, Gabriela Nicolescu, Felipe Gohring de Magalh\~aes

TL;DR
This paper evaluates how adversarial training, intended to improve model robustness, inadvertently increases vulnerability to extraction attacks, revealing that robust models can be more easily replicated and pose privacy risks.
Contribution
First comprehensive empirical assessment of extraction attack vulnerability on adversarially trained vision models, highlighting increased risks and transferability of robustness.
Findings
Adversarially trained models are more vulnerable to extraction attacks.
Extracted models from robust training achieve higher accuracy with fewer queries.
Robustness features transfer to extracted models, enhancing adversarial resilience.
Abstract
Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Autopsy Techniques and Outcomes
