NSmark: Null Space Based Black-box Watermarking Defense Framework for Language Models
Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Tianjie, Ju, Peixuan Chen, Zhuosheng Zhang, Gongshen Liu

TL;DR
NSmark is a novel black-box watermarking framework for language models that leverages null space invariance to resist sophisticated attacks, ensuring robust ownership verification without degrading model performance.
Contribution
The paper introduces NSmark, a task-agnostic watermarking scheme that exploits null space properties to withstand LL-LFEA attacks in black-box settings, advancing watermark robustness.
Findings
Effective resistance to LL-LFEA attacks demonstrated
High watermark embedding capacity with preserved model performance
Scalable and reliable verification across tasks
Abstract
Language models (LMs) have emerged as critical intellectual property (IP) assets that necessitate protection. Although various watermarking strategies have been proposed, they remain vulnerable to Linear Functionality Equivalence Attack (LFEA), which can invalidate most existing white-box watermarks without prior knowledge of the watermarking scheme or training data. This paper analyzes and extends the attack scenarios of LFEA to the commonly employed black-box settings for LMs by considering Last-Layer outputs (dubbed LL-LFEA). We discover that the null space of the output matrix remains invariant against LL-LFEA attacks. Based on this finding, we propose NSmark, a black-box watermarking scheme that is task-agnostic and capable of resisting LL-LFEA attacks. NSmark consists of three phases: (i) watermark generation using the digital signature of the owner, enhanced by spread spectrum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
