Segmenting Watermarked Texts From Language Models

Xingchi Li; Guanxun Li; Xianyang Zhang

arXiv:2410.20670·cs.LG·October 29, 2024

Segmenting Watermarked Texts From Language Models

Xingchi Li, Guanxun Li, Xianyang Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a statistical method to detect and segment watermarked segments within texts generated by language models, even when the text has been modified, ensuring source traceability.

Contribution

It proposes a novel change point detection approach to identify watermarked sub-strings in LLM-generated texts, handling modifications and ensuring error control.

Findings

01

Accurately detects watermarked segments in generated texts

02

Handles modifications like substitutions, insertions, deletions

03

Demonstrates effectiveness on texts from multiple language models

Abstract

Watermarking is a technique that involves embedding nearly unnoticeable statistical signals within generated content to help trace its source. This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text from their LLM with a watermark. This setup makes it possible for a detector to later identify the source of the text if the user publishes it. The user can modify the generated text by substitutions, insertions, or deletions. Our objective is to develop a statistical method to detect if a published text is LLM-generated from the perspective of a detector. We further propose a methodology to segment the published text into watermarked and non-watermarked sub-strings. The proposed approach is built upon randomization tests and change point detection techniques. We demonstrate that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

doccstat/llm-watermark-cpd
pytorchOfficial

Videos

Segmenting Watermarked Texts From Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling