# On the Privacy Risks of Model Explanations

**Authors:** Reza Shokri, Martin Strobel, Yair Zick

arXiv: 1907.00164 · 2021-02-08

## TL;DR

This paper investigates how feature-based model explanations can inadvertently leak sensitive training data information through membership inference attacks, highlighting privacy risks associated with explainability methods.

## Contribution

It provides a comprehensive analysis of privacy risks in model explanations, especially demonstrating how backpropagation-based explanations can reveal training data membership.

## Key findings

- Backpropagation-based explanations leak significant training data information.
- Explanations reveal statistical details about decision boundaries.
- Perturbation-based explanations can mitigate some privacy risks.

## Abstract

Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information about the training data. We analyze connections between model explanations and the leakage of sensitive information about the model's training set. We investigate the privacy risks of feature-based model explanations using membership inference attacks: quantifying how much model predictions plus their explanations leak information about the presence of a datapoint in the training set of a model. We extensively evaluate membership inference attacks based on feature-based model explanations, over a variety of datasets. We show that backpropagation-based explanations can leak a significant amount of information about individual training datapoints. This is because they reveal statistical information about the decision boundaries of the model about an input, which can reveal its membership. We also empirically investigate the trade-off between privacy and explanation quality, by studying the perturbation-based model explanations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.00164/full.md

## Figures

39 figures with captions in the complete paper: https://tomesphere.com/paper/1907.00164/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/1907.00164/full.md

---
Source: https://tomesphere.com/paper/1907.00164