CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue,, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai

TL;DR
CineMaster introduces a 3D-aware, controllable text-to-video generation framework that enables precise scene editing and camera manipulation, leveraging an interactive workflow and automated data annotation to outperform existing methods.
Contribution
The paper presents a novel two-stage framework combining user-controlled scene construction with a 3D-guided diffusion model for improved text-to-video generation.
Findings
Outperforms existing 3D-aware text-to-video methods
Enables intuitive scene and camera control
Uses automated data annotation for large-scale training
Abstract
In this work, we present CineMaster, a novel framework for 3D-aware and controllable text-to-video generation. Our goal is to empower users with comparable controllability as professional film directors: precise placement of objects within the scene, flexible manipulation of both objects and camera in 3D space, and intuitive layout control over the rendered frames. To achieve this, CineMaster operates in two stages. In the first stage, we design an interactive workflow that allows users to intuitively construct 3D-aware conditional signals by positioning object bounding boxes and defining camera movements within the 3D space. In the second stage, these control signals--comprising rendered depth maps, camera trajectories and object class labels--serve as the guidance for a text-to-video diffusion model, ensuring to generate the user-intended video content. Furthermore, to overcome the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Computer Graphics and Visualization Techniques
MethodsDiffusion
