Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

Teng Pang; Zhiqiang Dong; Yan Zhang; Rongjian Xu; Guoqiang Wu; Yilong Yin

arXiv:2604.08174·cs.LG·April 10, 2026

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

Teng Pang, Zhiqiang Dong, Yan Zhang, Rongjian Xu, Guoqiang Wu, Yilong Yin

PDF

TL;DR

VGM$^2$P introduces a flow-based, coefficient-insensitive policy learning framework for offline multi-agent reinforcement learning, improving efficiency and performance by leveraging advantage-guided behavior cloning and classifier-free guidance.

Contribution

It proposes a novel flow-based policy method that enhances efficiency and expressiveness in offline multi-agent RL without sensitivity to behavior regularization coefficients.

Findings

01

Achieves state-of-the-art performance with only behavior cloning.

02

Demonstrates efficiency in both discrete and continuous action tasks.

03

Maintains performance without iterative sampling or distillation.

Abstract

Offline multi-agent reinforcement learning (MARL) aims to learn the optimal joint policy from pre-collected datasets, requiring a trade-off between maximizing global returns and mitigating distribution shift from offline data. Recent studies use diffusion or flow generative models to capture complex joint policy behaviors among agents; however, they typically rely on multi-step iterative sampling, thereby reducing training and inference efficiency. Although further research improves sampling efficiency through methods like distillation, it remains sensitive to the behavior regularization coefficient. To address the above-mentioned issues, we propose Value Guidance Multi-agent MeanFlow Policy (VGM $^{2}$ P), a simple yet effective flow-based policy learning framework that enables efficient action generation with coefficient-insensitive conditional behavior cloning. Specifically, VGM $^{2}$ P…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.