Remembering the Markov Property in Cooperative MARL
Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, David Abel, Amos Storkey

TL;DR
This paper critically examines the effectiveness of current MARL algorithms, arguing that their success stems from learning simple conventions rather than true Markovian reasoning, and advocates for new benchmarks emphasizing observation-grounded and memory-based reasoning.
Contribution
The paper highlights limitations of current MARL benchmarks and proposes principles for designing environments that require genuine Markovian reasoning and observation-grounded behaviors.
Findings
Current MARL success is due to learned conventions, not Markovian reasoning.
Brittle conventions can fail with non-adaptive partners.
Properly designed environments can promote true reasoning skills.
Abstract
Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents' behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
