InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy

Yang Tian; Yuyin Yang; Yiman Xie; Zetao Cai; Xu Shi; Ning Gao; Hangxu Liu; Xuekun Jiang; Zherui Qiu; Feng Yuan; Yaping Li; Ping Wang; Junhao Cai; Jia Zeng; Hao Dong; Jiangmiao Pang

arXiv:2511.16651·cs.RO·November 21, 2025

InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy

Yang Tian, Yuyin Yang, Yiman Xie, Zetao Cai, Xu Shi, Ning Gao, Hangxu Liu, Xuekun Jiang, Zherui Qiu, Feng Yuan, Yaping Li, Ping Wang, Junhao Cai, Jia Zeng, Hao Dong, Jiangmiao Pang

PDF

Open Access 1 Datasets

TL;DR

This paper demonstrates that large-scale synthetic data alone can effectively pre-train Vision-Language-Action models, achieving performance comparable to real-robot data and enabling zero-shot sim-to-real transfer in robotics.

Contribution

It introduces InternData-A1, a comprehensive synthetic dataset for robotic pre-training, and shows its effectiveness in matching real-robot pre-training performance across multiple tasks.

Findings

01

Synthetic data matches real-robot pre-training performance.

02

Zero-shot sim-to-real transfer achieved on challenging tasks.

03

Large-scale synthetic dataset enables scalable embodied AI research.

Abstract

Recent works explore how real and synthetic data contribute to Vision-Language-Action (VLA) models' generalization. While current VLA models have shown the strong effectiveness of large-scale real-robot pre-training, synthetic data has not previously demonstrated comparable capability at scale. This paper provides the first evidence that synthetic data alone can match the performance of the strongest $π$ -dataset in pre-training a VLA model, revealing the substantial value of large-scale simulation. The resulting model also exhibits surprisingly zero-shot sim-to-real transfer on several challenging tasks. Our synthetic dataset, InternData-A1, contains over 630k trajectories and 7,433 hours across 4 embodiments, 18 skills, 70 tasks, and 227 scenes, covering rigid, articulated, deformable, and fluid-object manipulation. It is generated through a highly autonomous, fully decoupled, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

InternRobotics/InternData-A1
dataset· 23k dl
23k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis