Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao,, Shengchuan Zhang, Bo Dai, Rongrong Ji

TL;DR
Director3D is a novel framework that generates realistic 3D scenes and camera trajectories from text by modeling camera paths and scene details with diffusion models, outperforming existing methods.
Contribution
The paper introduces Director3D, a comprehensive open-world text-to-3D generation framework that models real-world camera trajectories and scene details using diffusion-based components.
Findings
Outperforms existing 3D generation methods in real-world scenarios
Effectively models complex, scene-specific camera trajectories
Produces pixel-aligned 3D scene representations with high consistency
Abstract
Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories. To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions. (2) Next, a Gaussian-driven Multi-view Latent Diffusion Model serves as the Decorator, modeling the image sequence distribution given the camera trajectories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Handwritten Text Recognition Techniques
MethodsSoftmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Diffusion · Position-Wise Feed-Forward Layer · Dropout · Adam · Latent Diffusion Model · Attention Is All You Need
