D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

Yucheng Guo; Yongjian Guo; Zhong Guan; Wen Huang; Haoran Sun; Haodong Yue; Xiaolong Xiang; Shuai Di; Zhen Sun; Luqiao Wang; Junwu Xiong; Yicheng Gong

arXiv:2605.13276·cs.AI·May 15, 2026

D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

Yucheng Guo, Yongjian Guo, Zhong Guan, Wen Huang, Haoran Sun, Haodong Yue, Xiaolong Xiang, Shuai Di, Zhen Sun, Luqiao Wang, Junwu Xiong, Yicheng Gong

PDF

TL;DR

D-VLA is a novel distributed RL framework that enhances throughput and scalability for large-scale Vision-Language-Action models by introducing innovative decoupling and parallelization techniques.

Contribution

It proposes 'Plane Decoupling' and a 'Swimlane' pipeline to overcome resource conflicts and improve efficiency in training massive embodied AI models.

Findings

01

Significantly outperforms existing RL frameworks in throughput and sampling efficiency.

02

Maintains linear speedup and stability in trillion-parameter scale tests.

03

Achieves high concurrency and low latency in large-scale distributed RL.

Abstract

The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.