GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs
Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang,, Huazhe Xu, Lirui Wang

TL;DR
GenSim2 introduces a scalable framework using multi-modal reasoning LLMs for complex robotic simulation task generation, significantly reducing human effort and improving zero-shot transfer in policies.
Contribution
The paper presents a novel scalable pipeline and a multi-task policy architecture that together enable efficient generation and utilization of diverse simulation data for robotics.
Findings
Generated data covers up to 100 articulated tasks with 200 objects.
Policy trained on generated data achieves 20% performance boost in zero-shot transfer.
Framework reduces human effort in creating diverse simulation environments.
Abstract
Robotic simulation today remains challenging to scale up due to the human efforts required to create diverse simulation tasks and scenes. Simulation-trained policies also face scalability issues as many sim-to-real methods focus on a single task. To address these challenges, this work proposes GenSim2, a scalable framework that leverages coding LLMs with multi-modal and reasoning capabilities for complex and realistic simulation task creation, including long-horizon tasks with articulated objects. To automatically generate demonstration data for these tasks at scale, we propose planning and RL solvers that generalize within object categories. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. To utilize such data, we propose an effective multi-task language-conditioned policy architecture, dubbed proprioceptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsFocus
