Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics
Nitsan Soffair

TL;DR
This paper introduces SA2MA, a novel two-stage deep reinforcement learning algorithm that effectively solves complex cooperative Dec-POMDPs by leveraging single-agent policies to improve multi-agent coordination.
Contribution
SA2MA is a new algorithm that first solves a single-agent problem and then uses that policy to address multi-agent cooperation, outperforming existing methods.
Findings
SA2MA outperforms SOTA algorithms in complex domains.
The two-stage approach simplifies solving multi-agent cooperation.
SA2MA demonstrates clear advantages in complex cooperative tasks.
Abstract
WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Optimization and Search Problems · Multi-Agent Systems and Negotiation
