Mitigating Membership Inference in Intermediate Representations with Differentially Private Training
Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen

TL;DR
This paper proposes LM-DP-SGD, a layer-wise adaptive differentially private training method that reduces membership inference risks in intermediate representations while maintaining model utility.
Contribution
It introduces a novel layer-wise risk-aware DP-SGD that allocates privacy protection based on MIA vulnerability estimates, improving privacy-utility trade-offs.
Findings
LM-DP-SGD reduces IR-level MIA risk under the same privacy budget.
The method preserves model utility better than uniform DP-SGD.
Theoretical guarantees are established for privacy and convergence.
Abstract
In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
