Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Zihan Su, Xuerui Qiu, Hongbin Xu, Tangyu Jiang, Junhao Zhuang, Chun Yuan, Ming Li, Shengfeng He, Fei Richard Yu

TL;DR
Safe-Sora introduces a novel graphical watermarking framework for text-to-video generation, ensuring copyright protection while maintaining high video quality and robustness through hierarchical matching and spatiotemporal modeling.
Contribution
It is the first to embed graphical watermarks directly into video generation using hierarchical matching and state space models for enhanced robustness.
Findings
Achieves state-of-the-art watermark robustness and fidelity.
Maintains high video quality with embedded watermarks.
Demonstrates effective long-range dependency modeling in watermarking.
Abstract
The explosive growth of generative video models has amplified the demand for reliable copyright preservation of AI-generated content. Despite its popularity in image synthesis, invisible generative watermarking remains largely underexplored in video generation. To address this gap, we propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process. Motivated by the observation that watermarking performance is closely tied to the visual similarity between the watermark and cover content, we introduce a hierarchical coarse-to-fine adaptive matching mechanism. Specifically, the watermark image is divided into patches, each assigned to the most visually similar video frame, and further localized to the optimal spatial region for seamless embedding. To enable spatiotemporal fusion of watermark patches across video frames, we develop a 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
