Towards Generalized and Stealthy Watermarking for Generative Code Models

Haoxuan Li; Jiale Zhang; Xiaobing Sun; Xiapu Luo

arXiv:2506.20926·cs.CR·July 1, 2025

Towards Generalized and Stealthy Watermarking for Generative Code Models

Haoxuan Li, Jiale Zhang, Xiaobing Sun, Xiapu Luo

PDF

Open Access

TL;DR

This paper introduces CodeGuard, a novel watermarking technique for generative code models that enhances verification reliability and stealthiness, effectively protecting intellectual property without degrading model performance.

Contribution

CodeGuard combines attention mechanisms with distributed trigger embedding and homomorphic character replacement to improve generalization, stealthiness, and robustness of backdoor watermarks in GCMs.

Findings

01

Achieves up to 100% verification accuracy across tasks

02

Maintains primary task performance without degradation

03

Exhibits a detection rate of only 0.078 against ONION methods

Abstract

Generative code models (GCMs) significantly enhance development efficiency through automated code generation and code summarization. However, building and training these models require computational resources and time, necessitating effective digital copyright protection to prevent unauthorized leaks and misuse. Backdoor watermarking, by embedding hidden identifiers, simplifies copyright verification by breaking the model's black-box nature. Current backdoor watermarking techniques face two main challenges: first, limited generalization across different tasks and datasets, causing fluctuating verification rates; second, insufficient stealthiness, as watermarks are easily detected and removed by automated methods. To address these issues, we propose CodeGuard, a novel watermarking method combining attention mechanisms with distributed trigger embedding strategies. Specifically, CodeGuard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Adversarial Robustness in Machine Learning