SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

Chong Xia; Kai Zhu; Zizhuo Wang; Fangfu Liu; Zhizheng Zhang; Yueqi Duan

arXiv:2603.02133·cs.CV·March 4, 2026

SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

Chong Xia, Kai Zhu, Zizhuo Wang, Fangfu Liu, Zhizheng Zhang, Yueqi Duan

PDF

Open Access

TL;DR

SimRecon introduces a novel pipeline for compositional scene reconstruction from real videos, integrating semantic reconstruction, object generation, and assembly with modules enhancing visual fidelity and physical plausibility.

Contribution

It proposes a new perception-generation-simulation framework with bridging modules for improved realism and physical accuracy in scene reconstruction from videos.

Findings

01

Outperforms previous methods on ScanNet dataset

02

Achieves higher visual fidelity in reconstructed assets

03

Ensures physically plausible scene assembly

Abstract

Compositional scene reconstruction seeks to create object-centric representations rather than holistic scenes from real-world videos, which is natively applicable for simulation and interaction. Conventional compositional reconstruction approaches primarily emphasize on visual appearance and show limited generalization ability to real-world scenarios. In this paper, we propose SimRecon, a framework that realizes a "Perception-Generation-Simulation" pipeline towards cluttered scene reconstruction, which first conducts scene-level semantic reconstruction from video input, then performs single-object generation, and finally assembles these assets in the simulator. However, naively combining these three stages leads to visual infidelity of generated assets and physical implausibility of the final scene, a problem particularly severe for complex scenes. Thus, we further propose two bridging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Multimodal Machine Learning Applications