DeepLeak: Privacy Enhancing Hardening of Model Explanations Against Membership Leakage
Firas Ben Hmida, Zain Sbeih, Philemon Hailemariam, Birhanu Eshete

TL;DR
DeepLeak systematically assesses and mitigates privacy risks in ML explanations, developing advanced leakage profiling, practical hardening strategies, and analyzing algorithmic causes to enhance privacy-preserving interpretability.
Contribution
It introduces a comprehensive framework for auditing and reducing membership inference risks in explanation methods, with practical mitigation techniques and analysis of leakage causes.
Findings
Default explanation methods can leak up to 74.9% more membership info.
Mitigation strategies reduce leakage by up to 95%.
Minimal utility loss of explanation quality with privacy enhancements.
Abstract
Machine learning (ML) explainability is central to algorithmic transparency in high-stakes settings such as predictive diagnostics and loan approval. However, these same domains require rigorous privacy guaranties, creating tension between interpretability and privacy. Although prior work has shown that explanation methods can leak membership information, practitioners still lack systematic guidance on selecting or deploying explanation techniques that balance transparency with privacy. We present DeepLeak, a system to audit and mitigate privacy risks in post-hoc explanation methods. DeepLeak advances the state-of-the-art in three ways: (1) comprehensive leakage profiling: we develop a stronger explanation-aware membership inference attack (MIA) to quantify how much representative explanation methods leak membership information under default configurations; (2) lightweight hardening…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
