Director3D: Real-world Camera Trajectory and 3D Scene Generation from   Text

Xinyang Li; Zhangyu Lai; Linning Xu; Yansong Qu; Liujuan Cao,; Shengchuan Zhang; Bo Dai; Rongrong Ji

arXiv:2406.17601·cs.CV·June 26, 2024

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao,, Shengchuan Zhang, Bo Dai, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

Director3D is a novel framework that generates realistic 3D scenes and camera trajectories from text by modeling camera paths and scene details with diffusion models, outperforming existing methods.

Contribution

The paper introduces Director3D, a comprehensive open-world text-to-3D generation framework that models real-world camera trajectories and scene details using diffusion-based components.

Findings

01

Outperforms existing 3D generation methods in real-world scenarios

02

Effectively models complex, scene-specific camera trajectories

03

Produces pixel-aligned 3D scene representations with high consistency

Abstract

Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories. To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions. (2) Next, a Gaussian-driven Multi-view Latent Diffusion Model serves as the Decorator, modeling the image sequence distribution given the camera trajectories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imlixinyang/director3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Handwritten Text Recognition Techniques

MethodsSoftmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Diffusion · Position-Wise Feed-Forward Layer · Dropout · Adam · Latent Diffusion Model · Attention Is All You Need