IGen: Scalable Data Generation for Robot Learning from Open-World Images
Chenghao Gu, Haolan Kang, Junchao Lin, Jinghe Wang, Duo Wu, Shuzhao Xie, Fanding Huang, Junchen Ge, Ziyang Gong, Letian Li, Hongying Zheng, Changwei Lv, Zhi Wang

TL;DR
IGen is a scalable framework that generates realistic visual observations and executable actions from open-world images, enabling low-cost, large-scale robot learning data creation.
Contribution
IGen introduces a novel method to convert open-world images into structured 3D scenes and synthesize robot actions, bridging the gap between visual data and robot policy training.
Findings
Policies trained on IGen data perform comparably to real-world data-trained policies.
IGen produces high-quality visuomotor data from open-world images.
The framework effectively converts unstructured images into structured, actionable robot data.
Abstract
The rise of generalist robotic policies has created an exponential demand for large-scale training data. However, on-robot data collection is labor-intensive and often limited to specific environments. In contrast, open-world images capture a vast diversity of real-world scenes that naturally align with robotic manipulation tasks, offering a promising avenue for low-cost, large-scale robot data acquisition. Despite this potential, the lack of associated robot actions hinders the practical use of open-world images for robot learning, leaving this rich visual resource largely unexploited. To bridge this gap, we propose IGen, a framework that scalably generates realistic visual observations and executable actions from open-world images. IGen first converts unstructured 2D pixels into structured 3D scene representations suitable for scene understanding and manipulation. It then leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
