Invariant-based Robust Weights Watermark for Large Language Models
Qingxiao Guo, Xinjie Zhu, Yilong Ma, Hui Jin, Yunhao Wang, Weifeng Zhang, Xiaobing Guo

TL;DR
This paper presents a robust, retraining-free watermarking scheme for large language models that leverages model invariants and noise mechanisms to protect intellectual property against various attacks.
Contribution
It introduces a novel invariant-based watermarking method that does not require model retraining and can withstand multiple attack types in multi-user scenarios.
Findings
Demonstrates robustness against fine-tuning, pruning, and quantization attacks.
Effective in multi-user environments with collusion attack resistance.
Validated on Llama3, Phi3, and Gemma models with strong experimental results.
Abstract
Watermarking technology has gained significant attention due to the increasing importance of intellectual property (IP) rights, particularly with the growing deployment of large language models (LLMs) on billions resource-constrained edge devices. To counter the potential threats of IP theft by malicious users, this paper introduces a robust watermarking scheme without retraining or fine-tuning for transformer models. The scheme generates a unique key for each user and derives a stable watermark value by solving linear constraints constructed from model invariants. Moreover, this technology utilizes noise mechanism to hide watermark locations in multi-user scenarios against collusion attack. This paper evaluates the approach on three popular models (Llama3, Phi3, Gemma), and the experimental results confirm the strong robustness across a range of attack methods (fine-tuning, pruning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Steganography and Watermarking Techniques · Internet Traffic Analysis and Secure E-voting
