Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition
Yanda Zhu, Yuanyang Zhu, Daoyi Dong, Caihua Chen, Chunlin Chen

TL;DR
This paper introduces C3DT, a hierarchical multi-agent reinforcement learning framework that uses a conditional diffusion model to automatically infer and coordinate subtasks, improving efficiency and performance in complex dynamic environments.
Contribution
The paper proposes a novel two-level hierarchical MARL framework with a diffusion model for dynamic task decomposition, enabling automatic subtask inference and better coordination.
Findings
C3DT outperforms existing methods on benchmark tasks.
The diffusion-based approach improves sample efficiency.
Enhanced value decomposition through semantic subtask representations.
Abstract
Task decomposition has shown promise in complex cooperative multi-agent reinforcement learning (MARL) tasks, which enables efficient hierarchical learning for long-horizon tasks in dynamic and uncertain environments. However, learning dynamic task decomposition from scratch generally requires a large number of training samples, especially exploring the large joint action space under partial observability. In this paper, we present the Conditional Diffusion Model for Dynamic Task Decomposition (CT), a novel two-level hierarchical MARL framework designed to automatically infer subtask and coordination patterns. The high-level policy learns subtask representation to generate a subtask selection strategy based on subtask effects. To capture the effects of subtasks on the environment, CT predicts the next observation and reward using a conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
