TL;DR
Extend3D introduces a novel, training-free pipeline for generating large-scale 3D scenes from a single image by extending and dividing the latent space, refining patches, and optimizing for better structure and texture fidelity.
Contribution
The paper presents a new method that extends object-centric 3D generative models to scene-scale generation without training, using patch-wise generation and 3D-aware optimization.
Findings
Outperforms prior methods in human preference tests.
Achieves higher quantitative scores in 3D scene generation.
Effectively completes 3D structures via under-noising technique.
Abstract
In this paper, we propose Extend3D, a training-free pipeline for 3D scene generation from a single image, built upon an object-centric 3D generative model. To overcome the limitations of fixed-size latent spaces in object-centric models for representing wide scenes, we extend the latent space in the and directions. Then, by dividing the extended latent space into overlapping patches, we apply the object-centric 3D generative model to each patch and couple them at each time step. Since patch-wise 3D generation with image conditioning requires strict spatial alignment between image and latent patches, we initialize the scene using a point cloud prior from a monocular depth estimator and iteratively refine occluded regions through SDEdit. We discovered that treating the incompleteness of 3D structure as noise during 3D refinement enables 3D completion via a concept, which we term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
