A Certified Robust Watermark For Large Language Models

Xianheng Feng; Jian Liu; Kui Ren; Chun Chen

arXiv:2409.19708·cs.CR·October 1, 2024

A Certified Robust Watermark For Large Language Models

Xianheng Feng, Jian Liu, Kui Ren, Chun Chen

PDF

TL;DR

This paper introduces a novel certified robust watermarking method for large language models using randomized smoothing, providing provable guarantees against various attacks while maintaining performance.

Contribution

We propose the first certified robust watermark algorithm for large language models based on randomized smoothing, enhancing robustness with provable guarantees.

Findings

01

Achieves comparable performance to baseline watermark algorithms.

02

Provides substantial certified robustness against removal attacks.

03

Demonstrates effectiveness through comprehensive empirical evaluations.

Abstract

The effectiveness of watermark algorithms in AI-generated text identification has garnered significant attention. Concurrently, an increasing number of watermark algorithms have been proposed to enhance the robustness against various watermark attacks. However, these watermark algorithms remain susceptible to adaptive or unseen attacks. To address this issue, to our best knowledge, we propose the first certified robust watermark algorithm for large language models based on randomized smoothing, which can provide provable guarantees for watermarked text. Specifically, we utilize two different models respectively for watermark generation and detection and add Gaussian and Uniform noise respectively in the embedding and permutation space during the training and inference stages of the watermark detector to enhance the certified robustness of our watermark detector and derive certified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.