Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
Zeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei Fu

TL;DR
This paper introduces Stochastic MeanFlow Policies, a new generative policy class for reinforcement learning that combines expressive multimodal action distributions with stable, efficient one-step inference, improving exploration and performance.
Contribution
The paper proposes SMFP, a novel one-step generative policy using MeanFlow transformations, enabling tractable entropy estimation and stable off-policy mirror descent training.
Findings
SMFP outperforms Gaussian and generative baselines on MuJoCo benchmarks.
SMFP retains single-step inference efficiency while providing expressive action distributions.
The approach stabilizes policy improvement through entropy regularization with an MD constraint.
Abstract
Online off-policy reinforcement learning (RL) is shaped by two coupled choices: the policy class and the update rule. Gaussian policies are fast and have tractable entropy, but struggle with multimodal action distributions. Generative policies are more expressive, but often require iterative sampling or lack tractable entropy estimates. On the optimisation side, SAC-style soft policy improvement and mirror descent (MD) can be viewed as minimising different KL divergences: the former moves the policy towards a value-induced Boltzmann distribution, while the latter regularises each update against the previous policy. Combining entropy regularisation with an MD constraint is therefore attractive, as it supports exploration while stabilising policy improvement; however, the resulting target can be multimodal and is poorly matched by unimodal Gaussian policies. We propose Stochastic MeanFlow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
