What Can RL Bring to VLA Generalization? An Empirical Study

Jijia Liu; Feng Gao; Bingwen Wei; Xinlei Chen; Qingmin Liao; Yi Wu; Chao Yu; Yu Wang

arXiv:2505.19789·cs.LG·January 15, 2026

What Can RL Bring to VLA Generalization? An Empirical Study

Jijia Liu, Feng Gao, Bingwen Wei, Xinlei Chen, Qingmin Liao, Yi Wu, Chao Yu, Yu Wang

PDF

3 Models 1 Video

TL;DR

This study systematically evaluates how reinforcement learning, especially PPO, improves the generalization of large vision-language action models across various tasks and dimensions, outperforming supervised fine-tuning.

Contribution

It introduces a comprehensive benchmark for VLA generalization and demonstrates that RL fine-tuning, particularly with PPO, enhances semantic understanding and robustness over supervised methods.

Findings

01

RL fine-tuning with PPO improves semantic understanding.

02

PPO outperforms DPO and GRPO for VLAs.

03

A simple PPO training recipe enhances VLA generalization.

Abstract

Large Vision-Language Action (VLA) models have shown significant potential for embodied AI. However, their predominant training via supervised fine-tuning (SFT) limits generalization due to susceptibility to compounding errors under distribution shifts. Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. Our extensive experiments reveal that RL fine-tuning, particularly with PPO, significantly enhances generalization in semantic understanding and execution robustness over SFT, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

What Can RL Bring to VLA Generalization? An Empirical Study· slideslive

Taxonomy

MethodsShrink and Fine-Tune · Entropy Regularization · Proximal Policy Optimization · Direct Preference Optimization