ModelShield: Adaptive and Robust Watermark against Model Extraction   Attack

Kaiyi Pang; Tao Qi; Chuhan Wu; Minhao Bai; Minghu Jiang; Yongfeng; Huang

arXiv:2405.02365·cs.CR·January 14, 2025·1 cites

ModelShield: Adaptive and Robust Watermark against Model Extraction Attack

Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai, Minghu Jiang, Yongfeng, Huang

PDF

Open Access 1 Repo

TL;DR

ModelShield is an adaptive, robust watermarking technique for large language models that enables autonomous watermark embedding and detection, effectively protecting intellectual property without degrading content quality or requiring additional training.

Contribution

We propose ModelShield, a novel plug-and-play watermarking method with self-watermarking and robust detection mechanisms, improving IP protection for LLMs against extraction attacks.

Findings

01

Outperforms existing watermarking methods in robustness and effectiveness.

02

Maintains high content quality with minimal degradation.

03

Works across multiple datasets and LLM architectures.

Abstract

Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks, thereby enhancing the commercial value of their intellectual property (IP). To protect this IP, model owners typically allow user access only in a black-box manner, however, adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content. However, existing watermarking methods often compromise the quality of generated content due to heuristic alterations and lack robust mechanisms to counteract adversarial strategies, thus limiting their practicality in real-world scenarios. In this paper, we introduce an adaptive and robust watermarking method (named ModelShield) to protect the IP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amaoku/modelshield
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Vehicle License Plate Recognition · Automated Road and Building Extraction