VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

Guanxing Lu; Wenkai Guo; Chubin Zhang; Yuheng Zhou; Haonan Jiang; Zifeng Gao; Yansong Tang; Ziwei Wang

arXiv:2505.18719·cs.RO·May 27, 2025

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

Guanxing Lu, Wenkai Guo, Chubin Zhang, Yuheng Zhou, Haonan Jiang, Zifeng Gao, Yansong Tang, Ziwei Wang

PDF

Open Access 1 Repo

TL;DR

VLA-RL introduces a reinforcement learning framework that enhances vision-language-action models for robotic manipulation, improving performance and robustness in out-of-distribution scenarios through online adaptation.

Contribution

The paper presents a novel RL-based approach for fine-tuning pretrained VLA models, including a trajectory-level formulation and reward modeling, to improve robotic manipulation in diverse tasks.

Findings

01

VLA-RL outperforms finetuned baselines by 4.5% on LIBERO tasks.

02

It matches the performance of advanced commercial models.

03

Test-time optimization benefits from increased inference scaling.

Abstract

Recent high-capacity vision-language-action (VLA) models have demonstrated impressive performance on a range of robotic manipulation tasks by imitating human demonstrations. However, exploiting offline data with limited visited states will cause execution failure in out-of-distribution scenarios. Intuitively, an exploration-based method that improves on online collected data at test time could address this limitation. We present VLA-RL, an algorithmic and systematic framework that leverages online reinforcement learning (RL) to improve pretrained auto-regressive VLAs in downstream tasks. Within a unified perspective, we first introduce a trajectory-level RL formulation for auto-regressive VLA training, which models general robotic manipulation trajectory as multi-modal multi-turn conversation. To address the challenge of sparse rewards, we fine-tune a pretrained vision-language model as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guanxinglu/vlarl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Modular Robots and Swarm Intelligence