Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks
Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Yuan Liu, Mohan Li, and Zhihong Tian

TL;DR
Neural Honeytrace is a training-free, plug-and-play watermarking framework for neural networks that enhances robustness against model extraction attacks by reducing verification queries and leveraging backdoor learning effects.
Contribution
It introduces a novel, training-free watermarking method based on an information-theoretic transmission strategy, improving flexibility and robustness over existing approaches.
Findings
Reduces verification queries to as low as 2% of existing methods.
Operates without any retraining or additional training costs.
Demonstrates robustness against adaptive model extraction attacks.
Abstract
Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as of existing methods, while incurring zero training cost.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
Methodstravel james
