SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia; Ruisi Zhang; Ashish Kundu; Farinaz Koushanfar

arXiv:2601.02602·cs.CR·May 11, 2026

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar

PDF

TL;DR

SWaRL introduces a reinforcement learning-based framework for embedding robust, verifiable watermarks into code generated by large language models, ensuring integrity, detectability, and resilience against attacks.

Contribution

It presents a novel RL-based co-training approach with compiler feedback and a confidential verifier, improving watermark robustness and transferability in code LLMs.

Findings

01

Achieves high watermark detection accuracy

02

Maintains full code functionality after watermarking

03

Resilient against refactoring and adversarial attacks

Abstract

We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLMs by embedding unique and verifiable signatures in the generated program. Existing watermarking approaches either rely on handcrafted code transformations or manipulate token generation probabilities at inference time, making them vulnerable to removal attacks or prone to breaking functional correctness. To address these challenges, SWaRL employs a reinforcement learning-based co-training framework that uses compiler feedback for functional correctness and a jointly trained confidential verifier as a reward signal to maintain watermark detectability. Furthermore, SWaRL employs low-rank adaptation (LoRA) during fine-tuning, enabling efficient integration of watermarking behavior and transferability across model updates. Extensive experiments show that SWaRL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.