A tutorial note on collecting simulated data for vision-language-action models

Heran Wu; Zirun Zhou; Jingfeng Zhang

arXiv:2508.06547·cs.RO·August 12, 2025

A tutorial note on collecting simulated data for vision-language-action models

Heran Wu, Zirun Zhou, Jingfeng Zhang

PDF

Open Access

TL;DR

This paper reviews methods for generating and utilizing high-quality simulated datasets to train vision-language-action models in robotics, emphasizing simulation, benchmarking, and large-scale data collection.

Contribution

It introduces three key systems—PyBullet, LIBERO, and RT-X—for data generation, standardization, and large-scale collection in training VLA models.

Findings

01

PyBullet enables flexible custom data simulation.

02

LIBERO provides standardized task benchmarks.

03

RT-X facilitates large-scale multi-robot data collection.

Abstract

Traditional robotic systems typically decompose intelligence into independent modules for computer vision, natural language processing, and motion control. Vision-Language-Action (VLA) models fundamentally transform this approach by employing a single neural network that can simultaneously process visual observations, understand human instructions, and directly output robot actions -- all within a unified framework. However, these systems are highly dependent on high-quality training datasets that can capture the complex relationships between visual observations, language instructions, and robotic actions. This tutorial reviews three representative systems: the PyBullet simulation framework for flexible customized data generation, the LIBERO benchmark suite for standardized task definition and evaluation, and the RT-X dataset collection for large-scale multi-robot data acquisition. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques