VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu, Wenkai Guo, Chubin Zhang, Yuheng Zhou, Haonan Jiang, Zifeng Gao, Yansong Tang, Ziwei Wang

TL;DR
VLA-RL introduces a reinforcement learning framework that enhances vision-language-action models for robotic manipulation, improving performance and robustness in out-of-distribution scenarios through online adaptation.
Contribution
The paper presents a novel RL-based approach for fine-tuning pretrained VLA models, including a trajectory-level formulation and reward modeling, to improve robotic manipulation in diverse tasks.
Findings
VLA-RL outperforms finetuned baselines by 4.5% on LIBERO tasks.
It matches the performance of advanced commercial models.
Test-time optimization benefits from increased inference scaling.
Abstract
Recent high-capacity vision-language-action (VLA) models have demonstrated impressive performance on a range of robotic manipulation tasks by imitating human demonstrations. However, exploiting offline data with limited visited states will cause execution failure in out-of-distribution scenarios. Intuitively, an exploration-based method that improves on online collected data at test time could address this limitation. We present VLA-RL, an algorithmic and systematic framework that leverages online reinforcement learning (RL) to improve pretrained auto-regressive VLAs in downstream tasks. Within a unified perspective, we first introduce a trajectory-level RL formulation for auto-regressive VLA training, which models general robotic manipulation trajectory as multi-modal multi-turn conversation. To address the challenge of sparse rewards, we fine-tune a pretrained vision-language model as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Modular Robots and Swarm Intelligence
