Watermarking Discrete Diffusion Language Models

Avi Bagchi; Akhil Bhimaraju; Moulik Choraria; Daniel Alabi; and Lav R. Varshney

arXiv:2511.02083·cs.CR·February 16, 2026

Watermarking Discrete Diffusion Language Models

Avi Bagchi, Akhil Bhimaraju, Moulik Choraria, Daniel Alabi, and Lav R. Varshney

PDF

Open Access

TL;DR

This paper introduces a novel watermarking method for discrete diffusion language models that ensures reliable detection, is distortion-free, and easy to deploy without extensive hyperparameter tuning.

Contribution

It presents one of the first watermarking techniques for DDLMs using a distribution-preserving Gumbel-max sampling trick with proven detection reliability and no need for hyperparameter tuning.

Findings

01

Reliable detectability demonstrated on LLaDA.

02

Watermark is distortion-free with exponentially decreasing false detection probability.

03

Method is straightforward to deploy and scale across models.

Abstract

Watermarking has emerged as a promising technique to track AI-generated content and differentiate it from authentic human creations. While prior work extensively studies watermarking for autoregressive large language models (LLMs) and image diffusion models, it remains comparatively underexplored for discrete diffusion language models (DDLMs), which are becoming popular due to their high inference throughput. In this paper, we introduce one of the first watermarking methods for DDLMs. Our approach applies a distribution-preserving Gumbel-max sampling trick at every diffusion step and seeds the randomness by sequence position to enable reliable detection. We empirically demonstrate reliable detectability on LLaDA, a state-of-the-art DDLM. We also analytically prove that the watermark is distortion-free, with a false detection probability that decays exponentially in the sequence length.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Topic Modeling