Membership Inference Attacks on Discrete Diffusion Language Models
Shailesh Kasivelrajan

TL;DR
This paper demonstrates that masked diffusion language models are highly vulnerable to membership inference attacks, surpassing existing baselines, with effective transfer attacks using shadow models trained on unrelated data.
Contribution
It introduces novel membership inference attack techniques on MDLMs, including feature extraction and shadow model transfer, revealing significant privacy risks.
Findings
XGBoost classifiers achieve up to 0.930 AUC on the MIMIR benchmark.
ELBO trajectory features are the primary driver of attack success.
Shadow model transfer attack achieves 0.858 AUC, close to white box performance.
Abstract
Masked Diffusion Language Models MDLMs replace autoregressive generation with iterative demasking and their privacy properties are largely unstudied. We study membership inference attacks MIA on fine tuned MDLMs and show they are significantly more vulnerable than current grey box baselines suggest. We extract a 46 dimensional feature vector from the models reconstruction loss at four masking ratios and train XGBoost and MLP classifiers on top. On the MIMIR benchmark across six text domains XGBoost achieves mean AUC 0.878 peaking at 0.930 on Pile CC and beats the SAMA grey box baseline by 0.062 AUC on average. A leave one signal out ablation shows that the ELBO trajectory alone drives most of this with a mean drop of 0.130 when removed while attention features add almost nothing below 0.003. We also design a shadow model transfer attack where K equals 3 surrogate MDLMs trained on data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
