Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion
Zhenghong Zhou, Xiaohang Zhan, Zhiqin Chen, Soo Ye Kim, Nanxuan Zhao, Haitian Zheng, Qing Liu, He Zhang, Zhe Lin, Yuqian Zhou, Jiebo Luo

TL;DR
Tri-Prompting introduces a unified video diffusion framework that enables precise control over scene, subject, and motion, enhancing multi-view consistency and identity preservation for versatile content creation.
Contribution
It presents a novel unified architecture and training paradigm that integrates scene, subject, and motion control in video diffusion models, supporting complex, multi-faceted video editing tasks.
Findings
Outperforms specialized baselines in multi-view subject identity preservation
Achieves superior 3D consistency and motion accuracy
Enables novel workflows like 3D-aware subject insertion
Abstract
Recent video diffusion models have made remarkable strides in visual quality, yet precise, fine-grained control remains a key bottleneck that limits practical customizability for content creation. For AI video creators, three forms of control are crucial: (i) scene composition, (ii) multi-view consistent subject customization, and (iii) camera-pose or object-motion adjustment. Existing methods typically handle these dimensions in isolation, with limited support for multi-view subject synthesis and identity preservation under arbitrary pose changes. This lack of a unified architecture makes it difficult to support versatile, jointly controllable video. We introduce Tri-Prompting, a unified framework and two-stage training paradigm that integrates scene composition, multi-view subject consistency, and motion control. Our approach leverages a dual-condition motion module driven by 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Pose and Action Recognition
