Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories

Rushuai Yang; Zhiyuan Feng; Tianxiang Zhang; Kaixin Wang; Chuheng Zhang; Li Zhao; Xiu Su; Yi Chen; and Jiang Bian

arXiv:2511.19528·cs.RO·November 26, 2025

Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories

Rushuai Yang, Zhiyuan Feng, Tianxiang Zhang, Kaixin Wang, Chuheng Zhang, Li Zhao, Xiu Su, Yi Chen, and Jiang Bian

PDF

Open Access

TL;DR

This paper introduces DLR, a pattern discovery framework that enhances the diversity of RL-generated trajectories for vision-language-action pretraining, leading to improved downstream performance and scalable data generation.

Contribution

DLR enables the generation of multiple distinct, high-success behavioral patterns in RL, significantly increasing trajectory diversity for VLA pretraining compared to standard RL methods.

Findings

01

DLR produces more diverse trajectories than standard RL.

02

Pretraining on DLR data improves downstream task performance.

03

DLR exhibits positive data-scaling behavior.

Abstract

Scaling vision-language-action (VLA) model pre-training requires large volumes of diverse, high-quality manipulation trajectories. Most current data is obtained via human teleoperation, which is expensive and difficult to scale. Reinforcement learning (RL) methods learn useful skills through autonomous exploration, making them a viable approach for generating data. However, standard RL training collapses to a narrow execution pattern, limiting its utility for large-scale pre-training. We propose Discover, Lea rn and Reinforce (DLR), an information-theoretic pattern discovery framework that generates multiple distinct, high-success behavioral patterns for VLA pretraining. Empirically, DLR generates a markedly more diverse trajectory corpus on LIBERO. Specifically, it learns multiple distinct, high-success strategies for the same task where standard RL discovers only one, and hence it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI