Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning   Heuristics

Nitsan Soffair

arXiv:2211.15411·cs.LG·September 2, 2024

Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

Nitsan Soffair

PDF

Open Access

TL;DR

This paper introduces SA2MA, a novel two-stage deep reinforcement learning algorithm that effectively solves complex cooperative Dec-POMDPs by leveraging single-agent policies to improve multi-agent coordination.

Contribution

SA2MA is a new algorithm that first solves a single-agent problem and then uses that policy to address multi-agent cooperation, outperforming existing methods.

Findings

01

SA2MA outperforms SOTA algorithms in complex domains.

02

The two-stage approach simplifies solving multi-agent cooperation.

03

SA2MA demonstrates clear advantages in complex cooperative tasks.

Abstract

WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Optimization and Search Problems · Multi-Agent Systems and Negotiation