Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie, Zhizheng Liu, Zhenghao Peng, Wayne Wu, Bolei Zhou

TL;DR
Vid2Sim introduces a scalable framework that converts monocular videos into photorealistic 3D urban environments, significantly enhancing the training and deployment of navigation agents in real-world scenarios.
Contribution
This work presents a novel real2sim pipeline that bridges the sim2real gap by generating realistic 3D environments from videos for reinforcement learning.
Findings
Improves urban navigation success rate by 31.2% in digital twins.
Enhances real-world navigation success rate by 68.3%.
Demonstrates effective bridging of sim2real gap with video-based environment generation.
Abstract
Sim-to-real gap has long posed a significant challenge for robot learning in simulation, preventing the deployment of learned models in the real world. Previous work has primarily focused on domain randomization and system identification to mitigate this gap. However, these methods are often limited by the inherent constraints of the simulation and graphics engines. In this work, we propose Vid2Sim, a novel framework that effectively bridges the sim2real gap through a scalable and cost-efficient real2sim pipeline for neural 3D scene reconstruction and simulation. Given a monocular video as input, Vid2Sim can generate photorealistic and physically interactable 3D simulation environments to enable the reinforcement learning of visual navigation agents in complex urban environments. Extensive experiments demonstrate that Vid2Sim significantly improves the performance of urban navigation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · 3D Modeling in Geospatial Applications · Evacuation and Crowd Dynamics
