RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Yixue Zhang; Kun Wu; Zhi Gao; Zhen Zhao; Pei Ren; Zhiyuan Xu; Fei Liao; Xinhua Wang; Shichao Fan; Di Wu; Qiuxuan Feng; Meng Li; Zhengping Che; Chang Liu; Jian Tang

arXiv:2602.16444·cs.RO·February 20, 2026

RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Yixue Zhang, Kun Wu, Zhi Gao, Zhen Zhao, Pei Ren, Zhiyuan Xu, Fei Liao, Xinhua Wang, Shichao Fan, Di Wu, Qiuxuan Feng, Meng Li, Zhengping Che, Chang Liu, Jian Tang

PDF

Open Access 1 Datasets

TL;DR

RoboGene is an agentic framework that automates the generation of diverse, feasible robotic manipulation tasks to enhance pre-training data quality, leading to improved robot learning and generalization in real-world scenarios.

Contribution

The paper introduces RoboGene, a novel framework combining diversity sampling, self-reflection, and human-in-the-loop to generate high-quality, diverse robotic tasks for pre-training.

Findings

01

RoboGene outperforms existing foundation models in task quality and diversity.

02

Pre-training with RoboGene-generated data improves robot success rates.

03

Large-scale real-world experiments validate the effectiveness of RoboGene.

Abstract

The pursuit of general-purpose robotic manipulation is hindered by the scarcity of diverse, real-world interaction data. Unlike data collection from web in vision or language, robotic data collection is an active process incurring prohibitive physical costs. Consequently, automated task curation to maximize data value remains a critical yet under-explored challenge. Existing manual methods are unscalable and biased toward common tasks, while off-the-shelf foundation models often hallucinate physically infeasible instructions. To address this, we introduce RoboGene, an agentic framework designed to automate the generation of diverse, physically plausible manipulation tasks across single-arm, dual-arm, and mobile robots. RoboGene integrates three core components: diversity-driven sampling for broad task coverage, self-reflection mechanisms to enforce physical constraints, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

X-Humanoid/RoboGene
dataset· 6.9k dl
6.9k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications