VOW: Verifiable and Oblivious Watermark Detection for Large Language Models
Xiaokun Luan, Yihao Zhang, Pengcheng Su, Feiran Lei, Meng Sun

TL;DR
VOW introduces a privacy-preserving, verifiable watermark detection protocol for LLMs using secure two-party computation and VOPRF, enabling trustworthy detection without revealing sensitive text.
Contribution
It presents the first practical, cryptographically secure watermark detection method that is both privacy-preserving and verifiable for large language models.
Findings
VOW is effective for short texts.
It offers formal guarantees linking watermark insertion and detection.
VOW is robust against modern paraphrasing attacks.
Abstract
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
