SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
Xinjie Zhu, Zijing Zhao, Hui Jin, Qingxiao Guo, Yilong Ma, Yunhao Wang, Xiaobing Guo, Weifeng Zhang

TL;DR
SIGMark introduces a scalable, blind in-generation watermarking method for video diffusion models that maintains high robustness against disturbances and reduces computational costs, enhancing AI-generated video security.
Contribution
The paper presents SIGMark, a novel in-generation watermarking framework with blind extraction, addressing scalability and robustness issues in existing methods for video diffusion models.
Findings
Achieves high bit-accuracy under temporal and spatial disturbances
Reduces storage and computational costs for watermark extraction
Demonstrates scalability and robustness on modern diffusion models
Abstract
Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper is well-organized and clearly writtern. - The inituion of the paper is soundness. - Experimental results show the effectivness of the proposed method.
- Missing references: [1] Huang, Huayang et al. “ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization.” ArXiv abs/2411.03862 (2024): n. pag. Please also refer to the "Questions" section.
1. The paper identifies a practical limitation of many video in-generation watermarking systems: they are not truly blind yet require storing large message–key/template tables, which raises efficiency and storage concerns at scale. 2. The proposed design addresses both the blindness/scalability issue (via GF-PRC) and the temporal robustness issue (via SGO). 3. The paper is generally well organized with clear figures, which makes it easy to follow.
1. Limited novelty (GF-PRC). The GF-PRC component mainly builds on the original PRC method. Moreover, introducing PRC (Appendix E) negatively affects robustness. How to balance the impact of PRC, or improve PRC specifically for watermarking robustness requirements remains an open problem. 2. Scalability evidence. The paper claims that non-blind approaches incur prohibitive computational costs at scale and raise efficiency/storage issues, but this discussion remains at the level of Appendix B (C
+ I like the "blind" feature they add to diffusion-based video watermarking domain. It simplifies the watermarking deployment pipeline. PRC is a cryptographically strong primitive that enables encoding/decoding different messages with a single global key, in contrast to traditional encryption methods used in other watermarking methods that require storing different keying material for different messages. + The proposed SGO module is effective in handling various frame-level attacks in videos.
- The technical contribution for watermark extraction should be articulated with more methodological comparison. (See my suggestion below) - Experimental settings are not very clear.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
