Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
Catherine Huang, Martin Pawelczyk, Himabindu Lakkaraju

TL;DR
This paper uncovers significant privacy risks associated with post-hoc explanations of machine learning models, especially in sensitive image classification tasks, and proposes mitigation strategies like differentially private fine-tuning to protect data privacy.
Contribution
The work introduces two new membership inference attacks based on feature attribution explanations and demonstrates effective privacy mitigation via differentially private fine-tuning.
Findings
New attacks (VAR-LRT and L1/L2-LRT) outperform existing methods in identifying training data members.
Differentially private fine-tuning reduces attack success while maintaining high model accuracy.
Empirical validation across multiple architectures, datasets, explanation methods, and privacy settings.
Abstract
Predictive machine learning models are becoming increasingly deployed in high-stakes contexts involving sensitive personal data; in these contexts, there is a trade-off between model explainability and data privacy. In this work, we push the boundaries of this trade-off: with a focus on foundation models for image classification fine-tuning, we reveal unforeseen privacy risks of post-hoc model explanations and subsequently offer mitigation strategies for such risks. First, we construct VAR-LRT and L1/L2-LRT, two new membership inference attacks based on feature attribution explanations that are significantly more successful than existing explanation-leveraging attacks, particularly in the low false-positive rate regime that allows an adversary to identify specific training set members with confidence. Second, we find empirically that optimized differentially private fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Access Control and Trust · Privacy, Security, and Data Protection
