Provably Robust Watermarks for Open-Source Language Models

Miranda Christ; Sam Gunn; Tal Malkin; Mariana Raykova

arXiv:2410.18861·cs.CR·October 25, 2024

Provably Robust Watermarks for Open-Source Language Models

Miranda Christ, Sam Gunn, Tal Malkin, Mariana Raykova

PDF

Open Access

TL;DR

This paper introduces a novel watermarking scheme for open-source language models that modifies model parameters for detection, proving robustness against various attacks and enabling identification without secret model details.

Contribution

It presents the first parameter-based watermarking method for open-source LLMs that remains detectable from outputs and is proven unremovable under certain assumptions.

Findings

01

Robustness to token substitution and parameter perturbation attacks

02

Watermarks detectable from outputs without secret model info

03

Attackers need to significantly degrade model quality to bypass detection

Abstract

The recent explosion of high-quality language models has necessitated new methods for identifying AI-generated text. Watermarking is a leading solution and could prove to be an essential tool in the age of generative AI. Existing approaches embed watermarks at inference and crucially rely on the large language model (LLM) specification and parameters being secret, which makes them inapplicable to the open-source setting. In this work, we introduce the first watermarking scheme for open-source LLMs. Our scheme works by modifying the parameters of the model, but the watermark can be detected from just the outputs of the model. Perhaps surprisingly, we prove that our watermarks are unremovable under certain assumptions about the adversary's knowledge. To demonstrate the behavior of our construction under concrete parameter instantiations, we present experimental results with OPT-6.7B and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification