Self-Evolving 3D Scene Generation from a Single Image

Kaizhi Zheng; Yue Fan; Jing Gu; Zishuo Xu; Xuehai He; Xin Eric Wang

arXiv:2512.08905·cs.CV·December 10, 2025

Self-Evolving 3D Scene Generation from a Single Image

Kaizhi Zheng, Yue Fan, Jing Gu, Zishuo Xu, Xuehai He, Xin Eric Wang

PDF

Open Access

TL;DR

EvoScene is a self-evolving, training-free framework that progressively reconstructs detailed 3D scenes from a single image by combining geometric reasoning and visual knowledge through iterative 2D-3D domain alternation.

Contribution

It introduces a novel self-evolving, training-free method that iteratively improves 3D scene reconstruction from a single image by integrating existing models in a multi-stage process.

Findings

01

Achieves superior geometric stability and view-consistent textures.

02

Effectively completes unseen regions in 3D scenes.

03

Produces ready-to-use 3D meshes for practical applications.

Abstract

Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover reasonable geometry from single views, but their object-centric training limits generalization to complex, large-scale scenes with faithful structure and texture. We present EvoScene, a self-evolving, training-free framework that progressively reconstructs complete 3D scenes from single images. The key idea is combining the complementary strengths of existing models: geometric reasoning from 3D generation models and visual knowledge from video generation models. Through three iterative stages--Spatial Prior Initialization, Visual-guided 3D Scene Mesh Generation, and Spatial-guided Novel View Generation--EvoScene alternates between 2D and 3D domains, gradually improving both structure and appearance. Experiments on diverse scenes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging