Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Yixiao Xu; Binxing Fang; Rui Wang; Yinghai Zhou; Yuan Liu; Mohan Li; and Zhihong Tian

arXiv:2501.09328·cs.CR·January 22, 2026

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Yuan Liu, Mohan Li, and Zhihong Tian

PDF

Open Access 1 Repo

TL;DR

Neural Honeytrace is a training-free, plug-and-play watermarking framework for neural networks that enhances robustness against model extraction attacks by reducing verification queries and leveraging backdoor learning effects.

Contribution

It introduces a novel, training-free watermarking method based on an information-theoretic transmission strategy, improving flexibility and robustness over existing approaches.

Findings

01

Reduces verification queries to as low as 2% of existing methods.

02

Operates without any retraining or additional training costs.

03

Demonstrates robustness against adaptive model extraction attacks.

Abstract

Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as $2%$ of existing methods, while incurring zero training cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neurht/neurht
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis

Methodstravel james