SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations

Songchun Zhang; Huiyao Xu; Sitong Guo; Zhongwei Xie; Hujun Bao; Weiwei Xu; Changqing Zou

arXiv:2505.11992·cs.CV·July 14, 2025

SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations

Songchun Zhang, Huiyao Xu, Sitong Guo, Zhongwei Xie, Hujun Bao, Weiwei Xu, Changqing Zou

PDF

Open Access

TL;DR

SpatialCrafter leverages video diffusion models and geometric constraints to enable photorealistic 3D scene reconstruction from limited or single-view inputs, advancing the field of novel view synthesis.

Contribution

It introduces a novel framework combining diffusion models, geometric constraints, and depth priors for effective sparse-view 3D scene reconstruction.

Findings

01

Improves 3D scene reconstruction from sparse views.

02

Generates realistic 3D scene appearances.

03

Achieves precise camera control and 3D consistency.

Abstract

Novel view synthesis (NVS) boosts immersive experiences in computer vision and graphics. Existing techniques, though progressed, rely on dense multi-view observations, restricting their application. This work takes on the challenge of reconstructing photorealistic 3D scenes from sparse or single-view inputs. We introduce SpatialCrafter, a framework that leverages the rich knowledge in video diffusion models to generate plausible additional observations, thereby alleviating reconstruction ambiguity. Through a trainable camera encoder and an epipolar attention mechanism for explicit geometric constraints, we achieve precise camera control and 3D consistency, further reinforced by a unified scale estimation strategy to handle scale discrepancies across datasets. Furthermore, by integrating monocular depth priors with semantic features in the video latent space, our framework directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsSoftmax · Attention Is All You Need · Diffusion