Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
Weiqing He, Xiang Li, Li Shen, Weijie Su, Qi Long

TL;DR
This paper presents a new approach to watermarking language models that balances watermark strength and sampling efficiency by injecting pseudorandomness, improving detectability without sacrificing inference speed.
Contribution
It introduces a quantitative measure of watermark strength, characterizes the trade-off as an optimization problem, and proposes a mechanism to maximize watermark strength while maintaining sampling efficiency.
Findings
Maximized watermark detectability without reducing sampling efficiency.
Derived explicit Pareto curves for existing watermark schemes.
Demonstrated improved practical deployment of watermarking in language models.
Abstract
Watermarking is a principled approach for tracing the provenance of large language model (LLM) outputs, but its deployment in practice is hindered by inference inefficiency. Speculative sampling accelerates inference, with efficiency improving as the acceptance rate between draft and target models increases. Yet recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement. We revisit this trade-off and show it is not absolute. We introduce a quantitative measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers. Using this measure, we fully characterize the trade-off as a constrained optimization problem and derive explicit Pareto curves for two existing watermarking schemes. Finally, we introduce a principled mechanism that…
Peer Reviews
Decision·ICLR 2026 Poster
Overall, this is a solid and creative work. Here are some of its strengths: 1. The paper is clearly written. Even though I have significant prior knowledge about both watermarking and speculative decoding I believe this paper will not be too hard to follow for a newcomer to these fields. 2. The idea is novel and seemingly powerful. Furthermore, the authors address the combination of two important subjects in the AI community - trustworthiness and efficiency. 3. The analysis is very interestin
### Major: - **Clarity on SynthID:** The paper uses SynthId as a case-study in this work, as it is an 'unbiased watermark'. However, SynthID is a relatively large class of watermarks that follow the tournament sampling mechanism. The general SynthID watermark is not even unbiased (when N>2). Unfortunately, the authors do not provide sufficient information to understand which specific case of SynthID is proposed in this work. I also believe that such information should be added for the sake of c
The paper has several strong results. The basic observation, that one can overcome the limitation of Hu et al. by allowing for pseudorandom draft-token acceptance, is very nice and important. Watermarking under speculative sampling is an extremely important technical issue for deploying watermarks in real-world LLMs. I haven't read the paper carefully, but it seems to make real progress on an important practical question.
I don't know of any.
## Strengths - **Conceptual clarity:** Moving from a binary notion to a continuous, information-theoretic strength is the right abstraction. It cleanly explains earlier "impossibility" statements and yields a usable frontier. - **Tight theory that lines up with practice:** The Chernoff–Stein interpretation makes sample-complexity predictions directly actionable; the entropy and TV bounds give interpretable ceilings. - **Simple, impactful algorithmic tweak:** Seeding the accept coin is minima
## Weaknesses / limitations - **Oracle knowledge vs deployment reality:** Many results assume access to the base distribution $P_t$ (or accurate logits) and, for the strongest detectors, the **accept coin**. External forensics often lacks both; estimation error can materially affect strength and false positives. - **Model/scale realism:** Simulated $(Q,P)$ pairs and small-vocab settings may not capture calibration quirks, long-tail tokens, or beam-blocking in large models. - **Robustness / a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
