Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning
Teng Pang, Zhiqiang Dong, Yan Zhang, Rongjian Xu, Guoqiang Wu, Yilong Yin

TL;DR
VGM$^2$P introduces a flow-based, coefficient-insensitive policy learning framework for offline multi-agent reinforcement learning, improving efficiency and performance by leveraging advantage-guided behavior cloning and classifier-free guidance.
Contribution
It proposes a novel flow-based policy method that enhances efficiency and expressiveness in offline multi-agent RL without sensitivity to behavior regularization coefficients.
Findings
Achieves state-of-the-art performance with only behavior cloning.
Demonstrates efficiency in both discrete and continuous action tasks.
Maintains performance without iterative sampling or distillation.
Abstract
Offline multi-agent reinforcement learning (MARL) aims to learn the optimal joint policy from pre-collected datasets, requiring a trade-off between maximizing global returns and mitigating distribution shift from offline data. Recent studies use diffusion or flow generative models to capture complex joint policy behaviors among agents; however, they typically rely on multi-step iterative sampling, thereby reducing training and inference efficiency. Although further research improves sampling efficiency through methods like distillation, it remains sensitive to the behavior regularization coefficient. To address the above-mentioned issues, we propose Value Guidance Multi-agent MeanFlow Policy (VGMP), a simple yet effective flow-based policy learning framework that enables efficient action generation with coefficient-insensitive conditional behavior cloning. Specifically, VGMP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
