Careful What You Wish For: on the Extraction of Adversarially Trained   Models

Kacem Khaled; Gabriela Nicolescu; Felipe Gohring de Magalh\~aes

arXiv:2207.10561·cs.LG·August 23, 2022

Careful What You Wish For: on the Extraction of Adversarially Trained Models

Kacem Khaled, Gabriela Nicolescu, Felipe Gohring de Magalh\~aes

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how adversarial training, intended to improve model robustness, inadvertently increases vulnerability to extraction attacks, revealing that robust models can be more easily replicated and pose privacy risks.

Contribution

First comprehensive empirical assessment of extraction attack vulnerability on adversarially trained vision models, highlighting increased risks and transferability of robustness.

Findings

01

Adversarially trained models are more vulnerable to extraction attacks.

02

Extracted models from robust training achieve higher accuracy with fewer queries.

03

Robustness features transfer to extracted models, enhancing adversarial resilience.

Abstract

Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KacemKhaled/model-stealing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Autopsy Techniques and Outcomes