Controllable Flow Matching for Online Reinforcement Learning

Bin Wang; Boxiang Tao; Haifeng Jing; Hongbo Dou; Zijian Wang

arXiv:2511.06816·cs.LG·January 6, 2026

Controllable Flow Matching for Online Reinforcement Learning

Bin Wang, Boxiang Tao, Haifeng Jing, Hongbo Dou, Zijian Wang

PDF

Open Access

TL;DR

This paper introduces CtrlFlow, a novel trajectory-level synthetic data generation method using conditional flow matching, which improves robustness and sample efficiency in online reinforcement learning without explicit environment modeling.

Contribution

It proposes CtrlFlow, a new approach that models trajectory distributions directly, enhancing stability and performance in model-based reinforcement learning.

Findings

01

Outperforms traditional dynamics models on MuJoCo benchmarks

02

Achieves higher sample efficiency than standard MBRL methods

03

Enhances robustness and generalization across tasks

Abstract

Model-based reinforcement learning (MBRL) typically relies on modeling environment dynamics for data efficiency. However, due to the accumulation of model errors over long-horizon rollouts, such methods often face challenges in maintaining modeling stability. To address this, we propose CtrlFlow, a trajectory-level synthetic method using conditional flow matching (CFM), which directly modeling the distribution of trajectories from initial states to high-return terminal states without explicitly modeling the environment transition function. Our method ensures optimal trajectory sampling by minimizing the control energy governed by the non-linear Controllability Gramian Matrix, while the generated diverse trajectory data significantly enhances the robustness and cross-task generalization of policy learning. In online settings, CtrlFlow demonstrates the better performance on common MuJoCo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning