GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation

Yuhao Wan; Lijuan Liu; Jingzhi Zhou; Zihan Zhou; Xuying Zhang; Dongbo Zhang; Shaohui Jiao; Qibin Hou; Ming-Ming Cheng

arXiv:2511.23191·cs.CV·December 1, 2025

GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation

Yuhao Wan, Lijuan Liu, Jingzhi Zhou, Zihan Zhou, Xuying Zhang, Dongbo Zhang, Shaohui Jiao, Qibin Hou, Ming-Ming Cheng

PDF

Open Access

TL;DR

GeoWorld introduces a novel pipeline that leverages full-frame geometry features from video frames to improve high-fidelity 3D scene generation from a single image, addressing distortions and blurriness in prior methods.

Contribution

The paper proposes a new approach using consecutive video frames and geometry models, along with a geometry alignment loss and adaptation module, to enhance 3D scene generation quality.

Findings

01

Outperforms prior methods qualitatively and quantitatively

02

Generates high-fidelity 3D scenes from a single image and camera trajectory

03

Effectively utilizes geometry features for improved consistency

Abstract

Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content. In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models and present our GeoWorld. Instead of exploiting geometric information obtained from a single-frame input, we propose to first generate consecutive video frames and then take advantage of the geometry model to provide full-frame geometry features, which contain richer information than single-frame depth maps or camera embeddings used in previous methods, and use these geometry features as geometrical conditions to aid the video generation model. To enhance the consistency of geometric structures, we further propose a geometry alignment loss to provide the model with real-world geometric constraints and a geometry adaptation module to ensure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis