Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Zeyuan Wang; Da Li; Yulin Chen; Yuehu Gong; Yanming Guo; Ye Shi; Liang Bai; Tianyuan Yu; Yanwei Fu

arXiv:2605.21282·cs.LG·May 22, 2026

Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent

Zeyuan Wang, Da Li, Yulin Chen, Yuehu Gong, Yanming Guo, Ye Shi, Liang Bai, Tianyuan Yu, Yanwei Fu

PDF

TL;DR

This paper introduces Stochastic MeanFlow Policies, a new generative policy class for reinforcement learning that combines expressive multimodal action distributions with stable, efficient one-step inference, improving exploration and performance.

Contribution

The paper proposes SMFP, a novel one-step generative policy using MeanFlow transformations, enabling tractable entropy estimation and stable off-policy mirror descent training.

Findings

01

SMFP outperforms Gaussian and generative baselines on MuJoCo benchmarks.

02

SMFP retains single-step inference efficiency while providing expressive action distributions.

03

The approach stabilizes policy improvement through entropy regularization with an MD constraint.

Abstract

Online off-policy reinforcement learning (RL) is shaped by two coupled choices: the policy class and the update rule. Gaussian policies are fast and have tractable entropy, but struggle with multimodal action distributions. Generative policies are more expressive, but often require iterative sampling or lack tractable entropy estimates. On the optimisation side, SAC-style soft policy improvement and mirror descent (MD) can be viewed as minimising different KL divergences: the former moves the policy towards a value-induced Boltzmann distribution, while the latter regularises each update against the previous policy. Combining entropy regularisation with an MD constraint is therefore attractive, as it supports exploration while stabilising policy improvement; however, the resulting target can be multimodal and is poorly matched by unimodal Gaussian policies. We propose Stochastic MeanFlow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.