Flow Matching Policy with Entropy Regularization
Ting Gao, Stavros Orfanoudakis, Nan Lin, Elvin Isufi, Winnie Daamen, Serge Hoogendoorn

TL;DR
The paper introduces FMER, an ODE-based reinforcement learning framework that improves policy exploration and efficiency by using flow matching and entropy regularization, outperforming existing diffusion-based methods.
Contribution
FMER is a novel ODE-based RL approach that incorporates entropy regularization and flow matching, enabling efficient and effective policy learning with better exploration.
Findings
FMER outperforms state-of-the-art methods on sparse multi-goal benchmarks.
FMER reduces training time by 7x compared to diffusion baselines.
FMER maintains competitive performance on standard benchmarks.
Abstract
Diffusion-based policies have gained significant popularity in Reinforcement Learning (RL) due to their ability to represent complex, non-Gaussian distributions. Stochastic Differential Equation (SDE)-based diffusion policies often rely on indirect entropy control due to the intractability of the exact entropy, while also suffering from computationally prohibitive policy gradients through the iterative denoising chain. To overcome these issues, we propose Flow Matching Policy with Entropy Regularization (FMER), an Ordinary Differential Equation (ODE)-based online RL framework. FMER parameterizes the policy via flow matching and samples actions along a straight probability path, motivated by optimal transport. FMER leverages the model's generative nature to construct an advantage-weighted target velocity field from a candidate set, steering policy updates toward high-value regions. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
