Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?
Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu

TL;DR
This paper investigates the robustness of LLM watermarks against removal techniques, revealing that current methods can effectively neutralize watermarks while preserving knowledge transfer, highlighting the need for more resilient watermarking strategies.
Contribution
The study introduces and evaluates watermark removal methods, demonstrating their effectiveness and efficiency, and underscores the necessity for improved watermark robustness in LLMs.
Findings
Watermark removal methods can fully eliminate inherited watermarks.
Post-distillation watermark neutralization maintains knowledge transfer.
Watermark removal approaches are computationally efficient.
Abstract
The radioactive nature of Large Language Model (LLM) watermarking enables the detection of watermarks inherited by student models when trained on the outputs of watermarked teacher models, making it a promising tool for preventing unauthorized knowledge distillation. However, the robustness of watermark radioactivity against adversarial actors remains largely unexplored. In this paper, we investigate whether student models can acquire the capabilities of teacher models through knowledge distillation while avoiding watermark inheritance. We propose two categories of watermark removal approaches: pre-distillation removal through untargeted and targeted training data paraphrasing (UP and TP), and post-distillation removal through inference-time watermark neutralization (WN). Extensive experiments across multiple model pairs, watermarking schemes and hyper-parameter settings demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Library Science and Information Systems · Data Quality and Management
MethodsKnowledge Distillation
