Probing Dec-POMDP Reasoning in Cooperative MARL
Kale-ab Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey

TL;DR
This paper introduces diagnostic tools to evaluate whether current cooperative MARL benchmarks genuinely require Dec-POMDP reasoning, revealing that many do not, and highlights the need for more rigorous environment design.
Contribution
The authors develop a diagnostic suite combining performance comparisons and information-theoretic probes to assess behavioral complexity in MARL benchmarks.
Findings
Many benchmarks do not require true Dec-POMDP reasoning.
Reactive policies often match memory-based agents' performance.
Emergent coordination is often based on brittle synchronous actions.
Abstract
Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralised coordination. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden states and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this reasoning or permit success via simpler strategies. We introduce a diagnostic suite combining statistically grounded performance comparisons and information-theoretic probes to audit the behavioural complexity of baseline policies (IPPO and MAPPO) across 37 scenarios spanning MPE, SMAX, Overcooked, Hanabi, and MaBrax. Our diagnostics reveal that success on these benchmarks rarely requires genuine Dec-POMDP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
