Backdooring Masked Diffusion Language Models
Daniel Yiming Cao, Chengzhong Wang, Sheng-Yen Chou, Chengyu Huang, Pin-Yu Chen, Shengwei An

TL;DR
This paper introduces SHADOWMASK, a novel backdoor attack on masked diffusion language models that effectively manipulates text generation while maintaining model utility and robustness against defenses.
Contribution
The work presents the first systematic backdoor attack method tailored for MDLMs, with a mathematical formulation and extensive evaluation demonstrating its effectiveness.
Findings
SHADOWMASK achieves near-100% attack success rate.
It outperforms standard data poisoning methods.
It remains effective under fine-tuning and defenses.
Abstract
Masked diffusion language models (MDLMs) are emerging as a compelling new paradigm for text generation, but their training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs rely on discrete state corruption and iterative denoising rather than continuous noising or left-to-right prediction. In this work, we present the first systematic study of training-time backdoor attacks on MDLMs. We propose SHADOWMASK, a backdoor attack that modifies the MDLM forward corruption process by replacing the standard all-mask terminal distribution with a trigger-mask mixture prior. This creates a dedicated denoising pathway from trigger-corrupted states to attacker-specified targets while preserving clean denoising behavior. We further provide a principled mathematical formulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
