Sequential Behavioral Watermarking for LLM Agents
Hyeseon An, Shinwoo Park, Dongsu Kim, Yo-Sub Han

TL;DR
SeqWM introduces a robust, sequence-based behavioral watermarking method for LLM agents that embeds signals into transition patterns, enabling reliable detection and ownership verification even under trajectory perturbations.
Contribution
It proposes SeqWM, a novel sequential behavioral watermarking framework that captures trajectory structure and remains robust against trajectory corruption and truncation.
Findings
SeqWM achieves consistent detection across diverse benchmarks.
It preserves agent utility while embedding watermarks.
SeqWM outperforms existing methods under trajectory perturbations.
Abstract
LLM-based agents act through sequences of executable decisions, but their trajectories provide little evidence of which agent or policy produced them, making provenance, ownership, and unauthorized reuse difficult to establish from observed behavior alone. This motivates watermarking signals embedded directly into agent behavior rather than only into generated text, since text watermarking cannot capture the action-level decisions that define agent execution. Recent agent watermarking methods address this gap by moving the watermark from generated text to behavioral choices. However, by treating each action step as an independent trial, they overlook trajectory structure and become fragile when trajectories are perturbed, truncated, or observed without reliable alignment. We propose SeqWM, a sequential behavioral watermarking framework that embeds signals into history-conditioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
