Segmenting Watermarked Texts From Language Models
Xingchi Li, Guanxun Li, Xianyang Zhang

TL;DR
This paper introduces a statistical method to detect and segment watermarked segments within texts generated by language models, even when the text has been modified, ensuring source traceability.
Contribution
It proposes a novel change point detection approach to identify watermarked sub-strings in LLM-generated texts, handling modifications and ensuring error control.
Findings
Accurately detects watermarked segments in generated texts
Handles modifications like substitutions, insertions, deletions
Demonstrates effectiveness on texts from multiple language models
Abstract
Watermarking is a technique that involves embedding nearly unnoticeable statistical signals within generated content to help trace its source. This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text from their LLM with a watermark. This setup makes it possible for a detector to later identify the source of the text if the user publishes it. The user can modify the generated text by substitutions, insertions, or deletions. Our objective is to develop a statistical method to detect if a published text is LLM-generated from the perspective of a detector. We further propose a methodology to segment the published text into watermarked and non-watermarked sub-strings. The proposed approach is built upon randomization tests and change point detection techniques. We demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
