Controllable Flow Matching for Online Reinforcement Learning
Bin Wang, Boxiang Tao, Haifeng Jing, Hongbo Dou, Zijian Wang

TL;DR
This paper introduces CtrlFlow, a novel trajectory-level synthetic data generation method using conditional flow matching, which improves robustness and sample efficiency in online reinforcement learning without explicit environment modeling.
Contribution
It proposes CtrlFlow, a new approach that models trajectory distributions directly, enhancing stability and performance in model-based reinforcement learning.
Findings
Outperforms traditional dynamics models on MuJoCo benchmarks
Achieves higher sample efficiency than standard MBRL methods
Enhances robustness and generalization across tasks
Abstract
Model-based reinforcement learning (MBRL) typically relies on modeling environment dynamics for data efficiency. However, due to the accumulation of model errors over long-horizon rollouts, such methods often face challenges in maintaining modeling stability. To address this, we propose CtrlFlow, a trajectory-level synthetic method using conditional flow matching (CFM), which directly modeling the distribution of trajectories from initial states to high-return terminal states without explicitly modeling the environment transition function. Our method ensures optimal trajectory sampling by minimizing the control energy governed by the non-linear Controllability Gramian Matrix, while the generated diverse trajectory data significantly enhances the robustness and cross-task generalization of policy learning. In online settings, CtrlFlow demonstrates the better performance on common MuJoCo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
