Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability
Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, and Reza Tourani

TL;DR
This paper investigates the role of hidden neuron features and explainability in membership inference attacks on deep learning models, proposing a new framework that improves attack success rates by up to 26%.
Contribution
It introduces an explainable, attack-driven framework that identifies influential raw data features and quantifies hidden neuron importance, addressing key knowledge gaps in MIA research.
Findings
Improved MIA success rate by up to 26%.
Identified most informative neurons and features for attacks.
Quantified significance of hidden activations on attack accuracy.
Abstract
The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root causes of attacks based on raw data features. In this paper, we aim to address these knowledge gaps by first exploring statistical approaches to identify the most informative neurons and quantifying the significance of the hidden activations from the selected neurons on attack accuracy, in isolation and combination. Additionally, we propose an attack-driven explainable framework by integrating the target and attack models to identify the most influential features of raw data that lead to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsKnowledge Management and Sharing · Online Learning and Analytics
