R-CoT: A Reasoning-Layer Watermark via Redundant Chain-of-Thought in Large Language Models
Ziming Zhang, Li Li, Guorui Feng, Hanzhou Wu, Xinpeng Zhang

TL;DR
This paper proposes R-CoT, a reasoning-layer watermarking method for large language models that embeds watermarks into reasoning paths, enhancing robustness against output perturbations.
Contribution
It introduces a dual-trajectory optimization mechanism to embed watermarks into reasoning paths, ensuring robustness and minimal impact on model performance.
Findings
R-CoT achieves over 95% true positive rate after fine-tuning.
Watermarks embedded via R-CoT are resistant to output perturbations.
The method outperforms existing watermarking techniques in robustness.
Abstract
Large language models (LLMs) are widely deployed in multiple scenarios due to reasoning capabilities. In order to prevent the models from being misused, watermarking is generally employed to ensure ownership. However, most existing watermarking methods rely on superficial modifications to the model's output distribution, rendering the watermark vulnerable to perturbation and removal. To overcome this challenge, this paper introduces a reasoning-layer framework termed Redundant Chain-of-Thought (R-CoT), which embeds watermarks into the reasoning path. A dual-trajectory optimization mechanism based on GRPO enables the native and the watermark reasoning path to coexist within a shared parameter space, internalizing the watermark as a distinct reasoning policy. Therefore, the watermark is embedded into the model's stable reasoning path, avoiding the watermark failure caused by output-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
