Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
Zhuoran Li, Hai Zhong, Xun Wang, Qingxin Xia, Lihua Zhang, Longbo Huang

TL;DR
This paper introduces OMAD, a novel online multi-agent reinforcement learning framework using diffusion policies, which enhances coordination and exploration without likelihood tractability, achieving state-of-the-art results in diverse tasks.
Contribution
The paper presents the first online off-policy diffusion policy framework for MARL, introducing a relaxed entropy objective and a joint distributional value function for stable decentralized coordination.
Findings
Achieves 2.5 to 5 times better sample efficiency.
Sets new state-of-the-art performance on 10 tasks.
Effectively facilitates exploration without likelihood computation.
Abstract
Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination. Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
