Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen; Kaiyuan Zhang; Yuntao Du; Edoardo Stoppa; Charles Fleming; Ashish Kundu; Bruno Ribeiro; Ninghui Li

arXiv:2601.20125·cs.LG·February 10, 2026

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen, Kaiyuan Zhang, Yuntao Du, Edoardo Stoppa, Charles Fleming, Ashish Kundu, Bruno Ribeiro, Ninghui Li

PDF

Open Access 3 Reviews

TL;DR

This paper systematically investigates privacy vulnerabilities in Diffusion Language Models (DLMs) using Membership Inference Attacks, introducing SAMA, a novel attack method that significantly improves detection accuracy and reveals critical privacy risks.

Contribution

The paper is the first to analyze MIA vulnerabilities in DLMs and proposes SAMA, a robust aggregation attack method tailored for the multiple maskable configurations of DLMs.

Findings

01

SAMA achieves 30% relative AUC improvement over baselines.

02

Up to 8 times better performance at low false positive rates.

03

Reveals significant privacy vulnerabilities in DLMs.

Abstract

Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage via Membership Inference Attacks (MIA) remains critically underexplored. This paper presents the first systematic investigation of MIA vulnerabilities in DLMs. Unlike the autoregressive models' single fixed prediction pattern, DLMs' multiple maskable configurations exponentially increase attack opportunities. This ability to probe many independent masks dramatically improves detection chances. To exploit this, we introduce SAMA (Subset-Aggregated Membership Attack), which addresses the sparse signal challenge through robust aggregation. SAMA samples masked subsets across progressive densities and applies sign-based statistics that remain effective despite heavy-tailed noise. Through inverse-weighted…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The proposed SAMA method is innovative since it is the first work to systematically study MIA risks for Diffusion Language Models. 2. The experimental setup is comprehensive, evaluating SAMA on state-of-the-art DLMs across diverse datasets against multiple baselines. The results are compelling and the ablation studies effectively justify the design choices. 3. The paper is written in a clear structure.

Weaknesses

1. The SAMA framework involves multiple components (e.g., progressive masking, subset aggregation, adaptive weighting) and hyperparameters. While effective, the method is somewhat complex. A more intuitive explanation and stronger justification for the necessity of each component could enhance clarity. 2. It would be good to propose or evaluate more defensive strategies to improve practical applicability.

Reviewer 02Rating 6Confidence 4

Strengths

- The work is highly significant and original as it provides the first systematic study of Membership Inference Attacks against the emerging and important class of Diffusion Language Models. It tackles a critical and underexplored privacy problem. - The quality of the work is high. The proposed SAMA attack is technically sound and specifically tailored to the unique properties of DLMs. The empirical evaluation is comprehensive, and the substantial performance gains over a wide range of baselin

Weaknesses

- The grey-box access model assumed in Section 2.1 appears to be quite strong. It posits that the adversary can query the target model with arbitrary, custom partially masked sequences and receive detailed outputs like logits for specific token positions. This may not be realistic in many practical scenarios. For instance, with closed-source models, an attacker might only be able to access the final output sequence and its token probabilities via an API. Conversely, with open-source models, an a

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper focuses on a new problem by systematically investigating membership inference vulnerabilities in diffusion-based language models, a model family that differs fundamentally from autoregressive architectures. 2. The proposed SAMA framework combines progressive masking and sign-based subset aggregation to effectively expose membership vulnerabilities. 3. The experimental evaluation is extensive and well-controlled, spanning multiple datasets, and various MIA baselines. 4. The abl

Weaknesses

1. The proposed method is specialized for diffusion-based language models and depends on access to compatible reference models, limiting its generalizability to broader LLM architectures. 2. The intuition behind progressive masking and subset aggregation remains somewhat abstract; a concrete example with simple proof or visualization would make the mechanism more interpretable. 3. The method incurs high query complexity due to multiple mask configurations, raising scalability concerns for la

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection