Model extraction from counterfactual explanations

Ulrich A\"ivodji; Alexandre Bolot; S\'ebastien Gambs

arXiv:2009.01884·cs.LG·September 7, 2020·20 cites

Model extraction from counterfactual explanations

Ulrich A\"ivodji, Alexandre Bolot, S\'ebastien Gambs

PDF

Open Access 1 Repo

TL;DR

This paper shows how counterfactual explanations, used for interpreting black-box models, can be exploited by adversaries to perform high-fidelity model extraction attacks, raising privacy concerns.

Contribution

It introduces a novel attack leveraging counterfactual explanations to accurately replicate black-box models, highlighting privacy vulnerabilities.

Findings

01

High-fidelity model extraction achievable with limited queries

02

Counterfactual explanations leak significant model information

03

Attack effective on real-world datasets

Abstract

Post-hoc explanation techniques refer to a posteriori methods that can be used to explain how black-box machine learning models produce their outcomes. Among post-hoc explanation techniques, counterfactual explanations are becoming one of the most popular methods to achieve this objective. In particular, in addition to highlighting the most important features used by the black-box model, they provide users with actionable explanations in the form of data instances that would have received a different outcome. Nonetheless, by doing so, they also leak non-trivial information about the model itself, which raises privacy issues. In this work, we demonstrate how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. More precisely, our attack enables the adversary to build a faithful copy of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aivodji/mrce
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education