TL;DR
AgentMark introduces a behavioral watermarking framework for LLM-based agents that embeds identifiers into planning decisions, preserving utility and enabling IP protection without compromising agent performance.
Contribution
It presents a novel distribution-preserving watermarking method for high-level planning behaviors in black-box agents, addressing a key challenge in IP protection.
Findings
Effective multi-bit capacity demonstrated across environments.
Robust recovery from partial logs shown in experiments.
Utility preservation maintained during watermarking.
Abstract
LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
