Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching

Zeqiao Li; Yijing Wang; Haoyu Wang; Zheng Li; Zhiqiang Zuo

arXiv:2602.01606·cs.LG·February 3, 2026

Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching

Zeqiao Li, Yijing Wang, Haoyu Wang, Zheng Li, Zhiqiang Zuo

PDF

Open Access

TL;DR

This paper introduces FLAME, a new framework that enhances maximum entropy reinforcement learning with one-step flow matching, achieving expressive policies with lower inference latency and improved exploration.

Contribution

FLAME develops a Q-Reweighted flow matching objective, a bias-corrected entropy estimator, and integrates MeanFlow for efficient one-step control in MaxEnt RL.

Findings

01

FLAME outperforms Gaussian baselines on MuJoCo tasks.

02

FLAME matches multi-step diffusion policies with lower inference cost.

03

The proposed methods improve exploration and policy expressiveness.

Abstract

Diffusion policies are expressive yet incur high inference latency. Flow Matching (FM) enables one-step generation, but integrating it into Maximum Entropy Reinforcement Learning (MaxEnt RL) is challenging: the optimal policy is an intractable energy-based distribution, and the efficient log-likelihood estimation required to balance exploration and exploitation suffers from severe discretization bias. We propose \textbf{F}low-based \textbf{L}og-likelihood-\textbf{A}ware \textbf{M}aximum \textbf{E}ntropy RL (\textbf{FLAME}), a principled framework that addresses these challenges. First, we derive a Q-Reweighted FM objective that bypasses partition function estimation via importance reweighting. Second, we design a decoupled entropy estimator that rigorously corrects bias, which enables efficient exploration and brings the policy closer to the optimal MaxEnt policy. Third, we integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning