RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
Harold Haodong Chen, Sirui Chen, Yingjie Xu, Wenhang Ge, Ying-Cong Chen

TL;DR
RoboEvolve introduces a co-evolutionary framework combining vision-language and video generation models to improve robotic manipulation data efficiency and scalability from unlabeled images.
Contribution
It presents a novel dual-phase, self-supervised learning system that scales from simple to complex tasks using a curriculum, significantly reducing data requirements.
Findings
Achieves 30-point improvement over base planners.
Increases simulator success rate by 48%.
Operates effectively with only 500 unlabeled seed images.
Abstract
The scalability of robotic manipulation is fundamentally bottlenecked by the scarcity of task-aligned physical interaction data. While vision-language models (VLMs) and video generation models (VGMs) hold promise for autonomous data synthesis, they suffer from semantic-spatial misalignment and physical hallucinations, respectively. To bridge this gap, we introduce RoboEvolve, a novel framework that couples a VLM planner and a VGM simulator into a mutually reinforcing co-evolutionary loop. Operating purely on unlabeled seed images, RoboEvolve leverages a cognitive-inspired dual-phase mechanism: (i) daytime exploration fosters physically grounded behavioral discovery through a semantic-controlled multi-granular reward, and (ii) nighttime consolidation mines "near-miss" failures to stabilize policy optimization. Guided by an autonomous progressive curriculum, the system naturally scales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
