RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training

Haoran Sun; Yongjian Guo; Zhong Guan; Shuai Di; Xiaodong Bai; Jing Long; Tianyun Zhao; Mingxi Luo; Hongke Zhao; Likang Wu; Xiaotie Deng; Xu Chu; Xi Xiao; Sheng Wen; Yicheng Gong; Junwu Xiong

arXiv:2602.05765·cs.AI·April 8, 2026

RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training

Haoran Sun, Yongjian Guo, Zhong Guan, Shuai Di, Xiaodong Bai, Jing Long, Tianyun Zhao, Mingxi Luo, Hongke Zhao, Likang Wu, Xiaotie Deng, Xu Chu, Xi Xiao, Sheng Wen, Yicheng Gong, Junwu Xiong

PDF

TL;DR

RL-VLA$^3$ is a novel asynchronous reinforcement learning framework designed for VLA training, significantly improving throughput and scalability while handling variable latencies in simulation environments.

Contribution

It introduces the first fully asynchronous RL framework specifically tailored for VLA training, addressing system-level challenges and enhancing performance.

Findings

01

Achieves up to 85.2% throughput improvement over synchronous methods.

02

Maintains sample efficiency comparable to traditional approaches.

03

Scales effectively from 8 to 256 GPUs.

Abstract

Reinforcement learning (RL) has emerged as a critical paradigm for post-training Vision-Language-Action (VLA) models, enabling embodied agents to adapt and improve through environmental interaction. However, existing RL frameworks for VLAs inherit synchronous design principles from traditional LLM training, treating entire rollouts as indivisible units and alternating strictly between data collection and policy optimization. This fundamentally mismatches the unique characteristics of VLA training, as physical simulators introduce highly variable, resource-intensive latencies. To address this, we introduce RL-VLA $^{3}$ , a fully asynchronous distributed RL framework that enables fine-grained asynchronous interaction between simulation, inference, and training components through dynamic batching schedulers and flexible environment sharding strategies. Extensive experiments across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.