Mean Flow Policy Optimization

Xiaoyi Dong; Xi Sheryl Zhang; Jian Cheng

arXiv:2604.14698·cs.LG·April 17, 2026

Mean Flow Policy Optimization

Xiaoyi Dong, Xi Sheryl Zhang, Jian Cheng

PDF

1 Repo

TL;DR

MeanFlow Policy Optimization (MFPO) introduces efficient flow-based policies for reinforcement learning, achieving comparable or better performance than diffusion models with significantly reduced training and inference times.

Contribution

The paper proposes MeanFlow models for RL policies, enabling faster training and inference while maintaining high performance, and develops a maximum entropy RL framework for these models.

Findings

01

MFPO matches or exceeds diffusion-based RL performance.

02

MFPO significantly reduces training and inference time.

03

Experiments on MuJoCo and DeepMind benchmarks validate effectiveness.

Abstract

Diffusion models have recently emerged as expressive policy representations for online reinforcement learning (RL). However, their iterative generative processes introduce substantial training and inference overhead. To overcome this limitation, we propose to represent policies using MeanFlow models, a class of few-step flow-based generative models, to improve training and inference efficiency over diffusion-based RL approaches. To promote exploration, we optimize MeanFlow policies under the maximum entropy RL framework via soft policy iteration, and address two key challenges specific to MeanFlow policies: action likelihood evaluation and soft policy improvement. Experiments on MuJoCo and DeepMind Control Suite benchmarks demonstrate that our method, Mean Flow Policy Optimization (MFPO), achieves performance comparable to or exceeding current diffusion-based baselines while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MFPolicy/MFPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.