ModelShield: Adaptive and Robust Watermark against Model Extraction Attack
Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai, Minghu Jiang, Yongfeng, Huang

TL;DR
ModelShield is an adaptive, robust watermarking technique for large language models that enables autonomous watermark embedding and detection, effectively protecting intellectual property without degrading content quality or requiring additional training.
Contribution
We propose ModelShield, a novel plug-and-play watermarking method with self-watermarking and robust detection mechanisms, improving IP protection for LLMs against extraction attacks.
Findings
Outperforms existing watermarking methods in robustness and effectiveness.
Maintains high content quality with minimal degradation.
Works across multiple datasets and LLM architectures.
Abstract
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks, thereby enhancing the commercial value of their intellectual property (IP). To protect this IP, model owners typically allow user access only in a black-box manner, however, adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content. However, existing watermarking methods often compromise the quality of generated content due to heuristic alterations and lack robust mechanisms to counteract adversarial strategies, thus limiting their practicality in real-world scenarios. In this paper, we introduce an adaptive and robust watermarking method (named ModelShield) to protect the IP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Vehicle License Plate Recognition · Automated Road and Building Extraction
