Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning
Zhuoran Li, Ling Pan, Longbo Huang

TL;DR
This paper introduces DOM2, a diffusion-based offline multi-agent reinforcement learning method that enhances policy diversity and expressiveness, leading to improved robustness, generalization, and data efficiency over existing algorithms.
Contribution
The paper proposes a novel diffusion model integrated into offline MARL, along with a trajectory-based data augmentation scheme, to improve policy expressiveness and robustness.
Findings
DOM2 outperforms state-of-the-art methods in multi-agent environments.
DOM2 generalizes better in shifted environments due to high policy diversity.
DOM2 achieves similar performance with 20+ times less data.
Abstract
We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion
