OM2P: Offline Multi-Agent Mean-Flow Policy

Zhuoran Li; Xun Wang; Hai Zhong; Qingxin Xia; Lihua Zhang; Longbo Huang

arXiv:2508.06269·cs.LG·March 2, 2026

OM2P: Offline Multi-Agent Mean-Flow Policy

Zhuoran Li, Xun Wang, Hai Zhong, Qingxin Xia, Lihua Zhang, Longbo Huang

PDF

Open Access

TL;DR

OM2P introduces an efficient offline multi-agent reinforcement learning algorithm using mean-flow models, significantly reducing memory and training time while maintaining superior performance in benchmark tasks.

Contribution

It is the first to successfully integrate mean-flow generative models into offline MARL, enabling one-step action sampling and improved efficiency.

Findings

01

Achieves up to 3.8x reduction in GPU memory usage.

02

Realizes up to 10.8x faster training times.

03

Demonstrates superior performance on Multi-Agent Particle and MuJoCo benchmarks.

Abstract

Generative models, especially diffusion and flow-based models, have been promising in offline multi-agent reinforcement learning. However, integrating powerful generative models into this framework poses unique challenges. In particular, diffusion and flow-based policies suffer from low sampling efficiency due to their iterative generation processes, making them impractical in time-sensitive or resource-constrained settings. To tackle these difficulties, we propose OM2P (Offline Multi-Agent Mean-Flow Policy), a novel offline MARL algorithm to achieve efficient one-step action sampling. To address the misalignment between generative objectives and reward maximization, we introduce a reward-aware optimization scheme that integrates a carefully-designed mean-flow matching loss with Q-function supervision. Additionally, we design a generalized timestep distribution and a derivative-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Optimization Algorithms · Auction Theory and Applications · Supply Chain and Inventory Management