SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking
Chenxi Gu, Xiaoning Du, John Grundy

TL;DR
This paper introduces SSG, a vocabulary partitioning method that enhances watermark detectability in large language models by increasing watermark strength through logit-balanced splits.
Contribution
The paper proposes SSG, a novel vocabulary partitioning algorithm that improves watermarking effectiveness by raising the lower bound of watermark strength in LLMs.
Findings
SSG improves watermark detectability in code and math reasoning tasks.
Experiments show increased watermark strength with SSG compared to random partitioning.
SSG effectively enhances watermark robustness under low-entropy conditions.
Abstract
Watermarking has emerged as a promising technique for tracing the authorship of content generated by large language models (LLMs). Among existing approaches, the KGW scheme is particularly attractive due to its versatility, efficiency, and effectiveness in natural language generation. However, KGW's effectiveness degrades significantly under low-entropy settings such as code generation and mathematical reasoning. A crucial step in the KGW method is random vocabulary partitioning, which enables adjustments to token selection based on specific preferences. Our study revealed that the next-token probability distribution plays an critical role in determining how much, or even whether, we can modify token selection and, consequently, the effectiveness of watermarking. We refer to this characteristic, associated with the probability distribution of each token prediction, as \emph{watermark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
