Explaining the Model, Protecting Your Data: Revealing and Mitigating the   Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference

Catherine Huang; Martin Pawelczyk; Himabindu Lakkaraju

arXiv:2407.17663·cs.CR·July 29, 2024

Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference

Catherine Huang, Martin Pawelczyk, Himabindu Lakkaraju

PDF

Open Access 1 Repo

TL;DR

This paper uncovers significant privacy risks associated with post-hoc explanations of machine learning models, especially in sensitive image classification tasks, and proposes mitigation strategies like differentially private fine-tuning to protect data privacy.

Contribution

The work introduces two new membership inference attacks based on feature attribution explanations and demonstrates effective privacy mitigation via differentially private fine-tuning.

Findings

01

New attacks (VAR-LRT and L1/L2-LRT) outperform existing methods in identifying training data members.

02

Differentially private fine-tuning reduces attack success while maintaining high model accuracy.

03

Empirical validation across multiple architectures, datasets, explanation methods, and privacy settings.

Abstract

Predictive machine learning models are becoming increasingly deployed in high-stakes contexts involving sensitive personal data; in these contexts, there is a trade-off between model explainability and data privacy. In this work, we push the boundaries of this trade-off: with a focus on foundation models for image classification fine-tuning, we reveal unforeseen privacy risks of post-hoc model explanations and subsequently offer mitigation strategies for such risks. First, we construct VAR-LRT and L1/L2-LRT, two new membership inference attacks based on feature attribution explanations that are significantly more successful than existing explanation-leveraging attacks, particularly in the low false-positive rate regime that allows an adversary to identify specific training set members with confidence. Second, we find empirically that optimized differentially private fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

catherinehuang82/explaining-model-protecting-data
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Access Control and Trust · Privacy, Security, and Data Protection