Modification and Generated-Text Detection: Achieving Dual Detection   Capabilities for the Outputs of LLM by Watermark

Yuhang Cai; Yaofei Wang; Donghui Hu; Chen Gu

arXiv:2502.08332·cs.CR·March 4, 2025

Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark

Yuhang Cai, Yaofei Wang, Donghui Hu, Chen Gu

PDF

Open Access

TL;DR

This paper presents a novel watermarking technique for LLM outputs that enables simultaneous detection of modifications and generated text, enhancing the security and integrity of language model outputs.

Contribution

The authors introduce a new unbiased watermark method and a 'discarded tokens' metric to detect both modifications and generated text in LLM outputs, addressing limitations of existing methods.

Findings

01

Effective dual detection of modifications and generated text achieved

02

New metric 'discarded tokens' correlates with modifications

03

Improved watermark detection robustness demonstrated

Abstract

The development of large language models (LLMs) has raised concerns about potential misuse. One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. For example, attackers can alter the watermarked text to produce harmful content without compromising the presence of the watermark, which could lead to false attribution of this malicious content to the LLM. This situation poses a serious threat to the LLMs service providers and highlights the significance of achieving modification detection and generated-text detection simultaneously. Therefore, we propose a technique to detect modifications in text for unbiased watermark which is sensitive to modification. We introduce a new metric called ``discarded tokens",…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Spam and Phishing Detection · Authorship Attribution and Profiling

Methodstravel james · Focus