Inferring Sensitive Attributes from Model Explanations
Vasisht Duddu, Antoine Boutet

TL;DR
This paper demonstrates that model explanations can be exploited by adversaries to accurately infer sensitive attributes of input data, revealing significant privacy risks associated with explanation methods in machine learning.
Contribution
It introduces the first attribute inference attack targeting model explanations, highlighting privacy vulnerabilities even when sensitive attributes are excluded from training data.
Findings
Adversaries can successfully infer sensitive attributes from explanations.
The attack remains effective even when sensitive attributes are not included in training data.
Exploiting explanations yields better inference success than using model predictions alone.
Abstract
Model explanations provide transparency into a trained machine learning model's blackbox behavior to a model builder. They indicate the influence of different input attributes to its corresponding model prediction. The dependency of explanations on input raises privacy concerns for sensitive user data. However, current literature has limited discussion on privacy risks of model explanations. We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., race and sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input. We evaluate our proposed attack on four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Privacy-Preserving Technologies in Data
