ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang

TL;DR
ReinFlow introduces an online RL framework that fine-tunes flow matching policies for robotic control, improving performance and efficiency in complex tasks by enabling stable exploration and rapid adaptation with minimal denoising steps.
Contribution
The paper presents a novel RL-based fine-tuning method for flow matching policies, allowing effective adaptation with fewer denoising steps and demonstrating significant performance gains in robotics tasks.
Findings
135.36% average reward increase in locomotion tasks
82.63% reduction in training time compared to DPPO
40.34% average success rate increase in manipulation tasks
Abstract
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Robot Manipulation and Learning
MethodsDiffusion
