NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Yuxue Yang, Lue Fan, Ziqi Shi, Junran Peng, Feng Wang, Zhaoxiang Zhang

TL;DR
NeoVerse introduces a scalable 4D world model that leverages monocular videos for reconstruction and video generation, overcoming previous limitations of data requirements and pre-processing, and achieving state-of-the-art results.
Contribution
NeoVerse presents a novel framework enabling scalable 4D reconstruction and video generation from in-the-wild monocular videos, with pose-free processing and degradation simulation.
Findings
Achieves state-of-the-art performance in 4D reconstruction benchmarks.
Demonstrates versatility across diverse domains.
Operates effectively with monocular videos without extensive pre-processing.
Abstract
In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
