DimensionX: Create Any 3D and 4D Scenes from a Single Image with   Controllable Video Diffusion

Wenqiang Sun; Shuo Chen; Fangfu Liu; Zilong Chen; Yueqi Duan; Jun; Zhang; Yikai Wang

arXiv:2411.04928·cs.CV·November 8, 2024

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Wenqiang Sun, Shuo Chen, Fangfu Liu, Zilong Chen, Yueqi Duan, Jun, Zhang, Yikai Wang

PDF

Open Access 1 Models

TL;DR

DimensionX is a novel framework that generates controllable, photorealistic 3D and 4D scenes from a single image by decoupling spatial and temporal factors in video diffusion, enabling precise scene reconstruction.

Contribution

We introduce ST-Director, a dimension-aware diffusion model that enhances controllability in 3D and 4D scene generation from single images.

Findings

01

Outperforms previous methods in controllability and realism

02

Effectively reconstructs 3D and 4D scenes from limited input

03

Demonstrates superior results on real-world and synthetic datasets

Abstract

In this paper, we introduce \textbf{DimensionX}, a framework designed to generate photorealistic 3D and 4D scenes from just a single image with video diffusion. Our approach begins with the insight that both the spatial structure of a 3D scene and the temporal evolution of a 4D scene can be effectively represented through sequences of video frames. While recent video diffusion models have shown remarkable success in producing vivid visuals, they face limitations in directly recovering 3D/4D scenes due to limited spatial and temporal controllability during generation. To overcome this, we propose ST-Director, which decouples spatial and temporal factors in video diffusion by learning dimension-aware LoRAs from dimension-variant data. This controllable video diffusion approach enables precise manipulation of spatial structure and temporal dynamics, allowing us to reconstruct both 3D and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
wenqsun/DimensionX
model· ♡ 61
♡ 61

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion