Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

Zhuoran Li; Hai Zhong; Xun Wang; Qingxin Xia; Lihua Zhang; Longbo Huang

arXiv:2602.18291·cs.AI·February 23, 2026

Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

Zhuoran Li, Hai Zhong, Xun Wang, Qingxin Xia, Lihua Zhang, Longbo Huang

PDF

Open Access

TL;DR

This paper introduces OMAD, a novel online multi-agent reinforcement learning framework using diffusion policies, which enhances coordination and exploration without likelihood tractability, achieving state-of-the-art results in diverse tasks.

Contribution

The paper presents the first online off-policy diffusion policy framework for MARL, introducing a relaxed entropy objective and a joint distributional value function for stable decentralized coordination.

Findings

01

Achieves 2.5 to 5 times better sample efficiency.

02

Sets new state-of-the-art performance on 10 tasks.

03

Effectively facilitates exploration without likelihood computation.

Abstract

Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination. Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications