EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, Zhizhong Su

TL;DR
EmbodiedGen is a versatile platform that automatically generates high-quality, realistic 3D worlds with accurate physical properties, facilitating scalable training and evaluation for embodied AI tasks.
Contribution
It introduces EmbodiedGen, a comprehensive toolkit for scalable, controllable, and photorealistic 3D asset generation using generative AI, addressing limitations of traditional manual asset creation.
Findings
Enables low-cost, high-quality 3D asset generation
Supports diverse, interactive 3D world creation
Facilitates physical control and real-world scale integration
Abstract
Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality, controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Action Observation and Synchronization · Multimodal Machine Learning Applications
