Inferring Sensitive Attributes from Model Explanations

Vasisht Duddu; Antoine Boutet

arXiv:2208.09967·cs.CR·September 9, 2022·1 cites

Inferring Sensitive Attributes from Model Explanations

Vasisht Duddu, Antoine Boutet

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that model explanations can be exploited by adversaries to accurately infer sensitive attributes of input data, revealing significant privacy risks associated with explanation methods in machine learning.

Contribution

It introduces the first attribute inference attack targeting model explanations, highlighting privacy vulnerabilities even when sensitive attributes are excluded from training data.

Findings

01

Adversaries can successfully infer sensitive attributes from explanations.

02

The attack remains effective even when sensitive attributes are not included in training data.

03

Exploiting explanations yields better inference success than using model predictions alone.

Abstract

Model explanations provide transparency into a trained machine learning model's blackbox behavior to a model builder. They indicate the influence of different input attributes to its corresponding model prediction. The dependency of explanations on input raises privacy concerns for sensitive user data. However, current literature has limited discussion on privacy risks of model explanations. We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., race and sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input. We evaluate our proposed attack on four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vasishtduddu/attinfexplanations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Privacy-Preserving Technologies in Data