OM2P: Offline Multi-Agent Mean-Flow Policy
Zhuoran Li, Xun Wang, Hai Zhong, Qingxin Xia, Lihua Zhang, Longbo Huang

TL;DR
OM2P introduces an efficient offline multi-agent reinforcement learning algorithm using mean-flow models, significantly reducing memory and training time while maintaining superior performance in benchmark tasks.
Contribution
It is the first to successfully integrate mean-flow generative models into offline MARL, enabling one-step action sampling and improved efficiency.
Findings
Achieves up to 3.8x reduction in GPU memory usage.
Realizes up to 10.8x faster training times.
Demonstrates superior performance on Multi-Agent Particle and MuJoCo benchmarks.
Abstract
Generative models, especially diffusion and flow-based models, have been promising in offline multi-agent reinforcement learning. However, integrating powerful generative models into this framework poses unique challenges. In particular, diffusion and flow-based policies suffer from low sampling efficiency due to their iterative generation processes, making them impractical in time-sensitive or resource-constrained settings. To tackle these difficulties, we propose OM2P (Offline Multi-Agent Mean-Flow Policy), a novel offline MARL algorithm to achieve efficient one-step action sampling. To address the misalignment between generative objectives and reward maximization, we introduce a reward-aware optimization scheme that integrates a carefully-designed mean-flow matching loss with Q-function supervision. Additionally, we design a generalized timestep distribution and a derivative-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Auction Theory and Applications · Supply Chain and Inventory Management
