ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang; Chao Yu; Sichang Su; Yu Wang

arXiv:2505.22094·cs.RO·January 9, 2026

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang

PDF

Open Access 1 Video

TL;DR

ReinFlow introduces an online RL framework that fine-tunes flow matching policies for robotic control, improving performance and efficiency in complex tasks by enabling stable exploration and rapid adaptation with minimal denoising steps.

Contribution

The paper presents a novel RL-based fine-tuning method for flow matching policies, allowing effective adaptation with fewer denoising steps and demonstrating significant performance gains in robotics tasks.

Findings

01

135.36% average reward increase in locomotion tasks

02

82.63% reduction in training time compared to DPPO

03

40.34% average success rate increase in manipulation tasks

Abstract

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Robot Manipulation and Learning

MethodsDiffusion