Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
Huanming Shen, Baizhou Huang, Xiaojun Wan

TL;DR
This paper introduces SEEK, a novel watermarking scheme for LLMs that significantly improves resilience against both scrubbing and spoofing attacks by leveraging equivalent texture keys and redundancy.
Contribution
It presents a new watermarking mechanism that breaks the traditional trade-off, achieving better robustness against both attack types without sacrificing performance.
Findings
SEEK outperforms prior methods in robustness metrics.
Spoofing robustness improved by over 88%.
Scrubbing robustness increased by up to 24.6%.
Abstract
Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work breaks this trade-off by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a novel watermark scheme with Sub-vocabulary decomposed Equivalent tExture Key (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing. Experiments demonstrate SEEK's superiority over prior method, yielding spoofing robustness gains of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques
