MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
Wayne Wu, Honglin He, Jack He, Yiran Wang, Chenda Duan, Zhizheng Liu,, Quanyi Li, Bolei Zhou

TL;DR
MetaUrban is a versatile simulation platform that enables research on AI-driven urban micromobility by creating diverse, interactive city environments to improve the safety and generalizability of mobile AI agents.
Contribution
This work introduces MetaUrban, a novel compositional simulation platform for urban micromobility research, supporting diverse scene generation and comprehensive evaluation of AI policies.
Findings
Heterogeneous mechanical structures affect AI policy learning.
Compositional environments enhance AI generalizability and safety.
Extensive evaluation demonstrates the platform's effectiveness.
Abstract
Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while robot dogs and humanoids have recently emerged in the street. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. Ensuring the generalizability and safety of AI models maneuvering mobile machines is essential. In this work, we present MetaUrban, a compositional simulation platform for the AI-driven urban micromobility research. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements,…
Peer Reviews
Decision·ICLR 2025 Spotlight
### Quality - Design is robust - Procedural generation system clearly enables extensive domain generation - Hierarchical layout generation, scalable obstacle retrieval, and cohabitant populating are convincingly argued to be crucial and effective for generating diverse scenes - Baselines are well-considered and it's clear the benchmark is built for the current state of AI technologies, and ha put thought into the future of the field - Extensive evaluation and well-posed questions ### Clar
### Clarity - Paper currently has walls of text and lacks signposting - more subheadings, titled paragraphs, etc. would help - Figures don't seem especially well-considered - could use more thoughtful design rather than pasting various info and not much more ### Quality - FMs are used creatively throughout the paper, but the paper could benefit from more material on how LLM failure modes are handled during generation
The proposed environments have been very carefully designed to represent real-world urban scenes. The paper is well-written and easy to follow. Further, the simulator and choice of tasks is well-positioned and well-motivated. There is an extensive Appendix that provides additional details, experiments, and context.
The performance of the simulator is a weakness. RGB rendering is, at best, 65 FPS, which is orders of magnitude lower than would be ideal and at least an order of magnitude lower than would be "good". I encourage the authors to see if there are features that can be turned off for the sake of performance if researches want to make that trade-off. For instance, can visual fidelity be reduced (by disabling things like shadows), or can animations be disabled to increase performance? I see the valu
1. The platform supports varied human and mobile agent behaviors, providing a rich set of interactions that better simulate the dynamics of crowded urban settings for long-horizon tasks. 2. The procedural generation of diverse terrains and object placements creates highly varied environments to help train robust models. 3. The paper explores the impact of different mechanical designs on mobile machines.
1. The main goal of such a large-scale simulator is to eventually transfer learned models to the real world. However, there is limited experimentation with real robots, and details on how well models trained in the simulator generalize to real-world sensor data are unclear. More real-world testing is necessary to demonstrate effective transferability. 2. The paper does not specify if robots, pedestrians, and other mobility devices can be controlled via standard interfaces like ROS. Without ROS
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Evacuation and Crowd Dynamics
MethodsEmirates Airlines Office in Dubai
