SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models
Yang Yang, Xinze Zou, Zehua Ma, Han Fang, Weiming Zhang

TL;DR
SKeDA introduces a novel watermarking framework for text-to-video diffusion models that enhances robustness against frame reordering, loss, and distortions while maintaining high video quality.
Contribution
The paper proposes SKeDA, a new generative watermarking method that uses permutation-tolerant sampling and differential attention to improve robustness in video watermarking.
Findings
SKeDA achieves high watermark robustness against frame reordering and distortions.
The framework maintains high fidelity in generated videos.
Extensive experiments validate the effectiveness of SKeDA.
Abstract
The rise of text-to-video generation models has raised growing concerns over content authenticity, copyright protection, and malicious misuse. Watermarking serves as an effective mechanism for regulating such AI-generated content, where high fidelity and strong robustness are particularly critical. Recent generative image watermarking methods provide a promising foundation by leveraging watermark information and pseudo-random keys to control the initial sampling noise, enabling lossless embedding. However, directly extending these techniques to videos introduces two key limitations: Existing designs implicitly rely on strict alignment between video frames and frame-dependent pseudo-random binary sequences used for watermark encryption. Once this alignment is disrupted, subsequent watermark extraction becomes unreliable; and Video-specific distortions, such as inter-frame compression,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Chaos-based Image/Signal Encryption · Generative Adversarial Networks and Image Synthesis
