QuantileMark: A Message-Symmetric Multi-bit Watermark for LLMs

Junlin Zhu; Baizhou Huang; Xiaojun Wan

arXiv:2604.13786·cs.CL·April 16, 2026

QuantileMark: A Message-Symmetric Multi-bit Watermark for LLMs

Junlin Zhu, Baizhou Huang, Xiaojun Wan

PDF

1 Repo

TL;DR

QuantileMark introduces a message-symmetric multi-bit watermarking method for large language models, embedding messages within continuous probability intervals to ensure unbiased detection and maintain text quality.

Contribution

It proposes a novel white-box watermarking technique that guarantees message-unbiasedness and improves robustness without affecting generation quality.

Findings

01

Enhanced multi-bit recovery accuracy

02

Improved detection robustness over baselines

03

Negligible impact on text generation quality

Abstract

As large language models become standard backends for content generation, practical provenance increasingly requires multi-bit watermarking. In provider-internal deployments, a key requirement is message symmetry: the message itself should not systematically affect either text quality or verification outcomes. Vocabulary-partition watermarks can break message symmetry in low-entropy decoding: some messages are assigned most of the probability mass, while others are forced to use tail tokens. This makes embedding quality and message decoding accuracy message-dependent. We propose QuantileMark, a white-box multi-bit watermark that embeds messages within the continuous cumulative probability interval $[0, 1)$ . At each step, QuantileMark partitions this interval into $M$ equal-mass bins and samples strictly from the bin assigned to the target symbol, ensuring a fixed $1/ M$ probability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zzzjunlin/QuantileMark
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.