Watermarking Should Be Treated as a Monitoring Primitive
Toluwani Aremu, Nils Lukas, Jie Zhang

TL;DR
This paper argues that watermarking in generative models should be viewed as a monitoring tool, emphasizing the importance of internal and observer-based detection methods over individual sample robustness.
Contribution
It introduces an observer-based threat model and demonstrates how watermark signals can be aggregated for entity-level attribution, highlighting the dual-use nature of watermarking.
Findings
Zero-bit watermarking enables attribution with multiple keys.
Persistent statistical structures can allow external monitoring over time.
Watermark design influences the balance between attribution and detectability.
Abstract
Watermarking is widely proposed for provenance, attribution, and safety monitoring in generative models, yet is typically evaluated only under adversaries who attempt to evade detection or induce false positives at the level of individual samples. We argue that watermarking should be treated as a monitoring primitive, and that internal monitoring is unavoidable given per-entity attribution keys and messages, as well as detector access. We introduce an observer-based threat model in which observers can aggregate watermark signals across outputs to infer entity-level information, showing that even zero-bit watermarking enables attribution under multi-key settings. We further show that external monitoring can emerge over time from persistent, key-dependent statistical structure, although this depends on watermark design and may be mitigated by distribution-preserving or undetectable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
