NSmark: Null Space Based Black-box Watermarking Defense Framework for   Language Models

Haodong Zhao; Jinming Hu; Peixuan Li; Fangqi Li; Jinrui Sha; Tianjie; Ju; Peixuan Chen; Zhuosheng Zhang; Gongshen Liu

arXiv:2410.13907·cs.CR·February 4, 2025

NSmark: Null Space Based Black-box Watermarking Defense Framework for Language Models

Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Tianjie, Ju, Peixuan Chen, Zhuosheng Zhang, Gongshen Liu

PDF

Open Access 1 Repo

TL;DR

NSmark is a novel black-box watermarking framework for language models that leverages null space invariance to resist sophisticated attacks, ensuring robust ownership verification without degrading model performance.

Contribution

The paper introduces NSmark, a task-agnostic watermarking scheme that exploits null space properties to withstand LL-LFEA attacks in black-box settings, advancing watermark robustness.

Findings

01

Effective resistance to LL-LFEA attacks demonstrated

02

High watermark embedding capacity with preserved model performance

03

Scalable and reliable verification across tasks

Abstract

Language models (LMs) have emerged as critical intellectual property (IP) assets that necessitate protection. Although various watermarking strategies have been proposed, they remain vulnerable to Linear Functionality Equivalence Attack (LFEA), which can invalidate most existing white-box watermarks without prior knowledge of the watermarking scheme or training data. This paper analyzes and extends the attack scenarios of LFEA to the commonly employed black-box settings for LMs by considering Last-Layer outputs (dubbed LL-LFEA). We discover that the null space of the output matrix remains invariant against LL-LFEA attacks. Based on this finding, we propose NSmark, a black-box watermarking scheme that is task-agnostic and capable of resisting LL-LFEA attacks. NSmark consists of three phases: (i) watermark generation using the digital signature of the owner, enhanced by spread spectrum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongdongzhaoup/nsmark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis