Optimizing Token Choice for Code Watermarking: An RL Approach

Zhimeng Guo; Huaisheng Zhu; Siyuan Xu; Hangfan Zhang; Teng Xiao; Minhao Cheng

arXiv:2508.11925·cs.CR·November 4, 2025

Optimizing Token Choice for Code Watermarking: An RL Approach

Zhimeng Guo, Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Minhao Cheng

PDF

Open Access

TL;DR

This paper presents CodeTracer, a reinforcement learning-based framework for embedding watermarks in code generated by large language models, ensuring detectability while preserving code functionality.

Contribution

It introduces a novel RL training paradigm with a policy-driven approach and Gumbel reparameterization for effective, subtle code watermarking.

Findings

01

CodeTracer outperforms existing methods in watermark detectability.

02

It maintains code functionality with minimal disruption.

03

The framework effectively balances watermark strength and code correctness.

Abstract

Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Vehicle License Plate Recognition · Digital Rights Management and Security