$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Kang Chen; Zhihao Liu; Tonghe Zhang; Zhen Guo; Si Xu; Hao Lin; Hongzhi Zang; Xiang Li; Quanlu Zhang; Zhaofei Yu; Guoliang Fan; Tiejun Huang; Yu Wang; and Chao Yu

arXiv:2510.25889·cs.LG·January 30, 2026

$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Kang Chen, Zhihao Liu, Tonghe Zhang, Zhen Guo, Si Xu, Hao Lin, Hongzhi Zang, Xiang Li, Quanlu Zhang, Zhaofei Yu, Guoliang Fan, Tiejun Huang, Yu Wang, and Chao Yu

PDF

1 Models

TL;DR

This paper introduces $ exttt{pi}_ exttt{RL}$, a novel reinforcement learning framework for flow-based vision-language-action models, addressing intractable likelihoods with flow-noise and flow-SDE methods, leading to improved robotic task performance.

Contribution

The paper proposes two innovative techniques, flow-noise and flow-SDE, to enable RL fine-tuning of large-scale flow-based VLAs, overcoming likelihood intractability issues.

Findings

01

RL improves performance in diverse benchmarks.

02

Flow-noise enables exact likelihood computation.

03

Flow-SDE facilitates efficient exploration.

Abstract

Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying RL to large-scale flow-based VLAs (\eg, $π_{0}$ , $π_{0.5}$ ) remains challenging due to intractable action log-likelihoods raised from flow matching. We address this challenge with $π_{RL}$ , featuring two technical approaches: (1) \textbf{Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) \textbf{Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration. We evaluate $π_{RL}$ across various benchmarks, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
RLinf/RLinf-Pi05-LIBERO-SFT
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.