CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems
Yan Wen, Junfeng Guo, Heng Huang

TL;DR
CoTGuard introduces a trigger-based detection framework that monitors intermediate reasoning steps in multi-agent LLM systems to effectively identify copyright violations without disrupting task performance.
Contribution
This paper presents CoTGuard, a novel approach leveraging trigger-based detection within Chain-of-Thought reasoning for fine-grained copyright protection in multi-agent LLM systems.
Findings
Effective detection of content leakage with minimal task interference
Fine-grained, interpretable monitoring of reasoning processes
Demonstrated robustness across various benchmarks
Abstract
As large language models (LLMs) evolve into autonomous agents capable of collaborative reasoning and task execution, multi-agent LLM systems have emerged as a powerful paradigm for solving complex problems. However, these systems pose new challenges for copyright protection, particularly when sensitive or copyrighted content is inadvertently recalled through inter-agent communication and reasoning. Existing protection techniques primarily focus on detecting content in final outputs, overlooking the richer, more revealing reasoning processes within the agents themselves. In this paper, we introduce CoTGuard, a novel framework for copyright protection that leverages trigger-based detection within Chain-of-Thought (CoT) reasoning. Specifically, we can activate specific CoT segments and monitor intermediate reasoning steps for unauthorized content reproduction by embedding specific trigger…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
1.This paper breaks new ground by tackling a largely overlooked issue: copyright leakage within the reasoning processes of multi-agent LLM systems. This is a forward-looking contribution, especially as such systems become more common in real-world applications. 2.The idea of using Chain-of-Thought as a watermarking medium is particularly interesting. It cleverly merges model interpretability with copyright protection into one cohesive framework. 3.Unlike many perturbation-based defense methods
1. The proposed method lacks strong innovation. The idea of using trigger-based patterns within Chain-of-Thought reasoning is interesting but not conceptually new or technically groundbreaking. 2. The paper does not provide enough detail about the trigger detection process. Key aspects such as similarity metrics, detection thresholds, and robustness analysis are missing, making the method hard to reproduce and evaluate. 3. Although the paper claims to focus on multi-agent LLM systems, the
- The paper identifies a novel and important challenge: copyright protection in multi-agent LLM systems that employ CoT reasoning. - The proposed trigger-based mechanism is inspiring, as it guides CoT reasoning using task-specific trigger patterns, so that it is easier to monitor and detect copyrighted content leakage during multi-agent interactions. - Experimental results show that CoTGuard achieves obviously better defense performance while minimizing degradation in task performance compared
- There lacks en evidence or analysis in this paper to demonstrate that the influence of trigger patterns remain while propagating through multiple agents and CoT reasoning steps. - Section 4.3 requires more explanation and clarification. For example: How are the “Syntax, Semantics, and Embedding-Based Detection” methods actually evaluated? Which embeddings are used, and how is the embedding-based similarity calculated? How is $\hat{r}_i$ parsed for candidate trigger patterns? - The impact of
This paper calls the method a guard, though it is most easily understood as a watermarking procedure (lines 244, Section 4, Section 3.2 refer to the method as "trigger-based watermarking"). - The idea of watermarking intermediate traces is interesting and is part of a large set of new issues around controlling reasoning models. - Choice of evaluation datasets is reasonable for Chain-of-thought, though not "full" reasoning models such as DeepSeek or similar commercial models.
This paper has issues around framing, methodology, and clarity that I think can only be resolved in a future submission. ## Inaccurate/Overclaiming W1. The abstract, introduction, and title of this paper suggest that it is a method for copyright protection. Rather, it is a detection method centered around embedding watermarks in chain of thought traces. This is provenance tracking or model fingerprinting, not a protection method. W2. The paper is initially unclear about which aspects of copyr
- **Novel problem**: The paper introduces a novel and important research problem: the IP protection of the reasoning process (CoT) itself, rather than just the model’s training data or final output. This is a forward-looking concern as agentic systems and their reasoning traces become more valuable. - **Sound evaluating framework**: The paper proposes an end-to-end framework, detailing both the watermark injection mechanism (Trigger-CoT Prompt Construction) and the detection algorithm within a m
- **Misleading Framing and Missing Threat Model**: This is the most significant weakness. The term “Copyright Protection” is a misnomer. The framework does not protect (i.e., prevent) leakage; it is a detection and watermarking framework for tracing the unauthorized use of reasoning IP. This framing is confusing and misrepresents the paper’s core contribution. The paper fails to formalize its threat model. It vaguely refers to “unauthorized reuse” but does not clearly define the attacker’s capa
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsFocus
