The Yokai Learning Environment: Tracking Beliefs Over Space and Time
Constantin Ruhdorfer, Matteo Bortoletto, Johannes Forkel, Jakob Foerster, Andreas Bulling

TL;DR
The Yokai Learning Environment (YLE) is a new multi-agent RL benchmark designed to better evaluate cooperative AI by requiring agents to track beliefs and shared knowledge, revealing limitations of existing methods in generalization and internal modeling.
Contribution
We introduce YLE, a novel benchmark that emphasizes belief tracking and reasoning, addressing limitations of the Hanabi Learning Environment in measuring algorithmic progress.
Findings
Leading ZSC methods perform poorly in YLE compared to HLE.
YLE exposes persistent gaps and calibration issues in current algorithms.
Progress on HLE does not necessarily translate to YLE.
Abstract
The ability to cooperate with unknown partners is a central challenge in cooperative AI and widely studied in the form of zero-shot coordination (ZSC), which evaluates an algorithm by measuring the performance of independently trained agents when paired. The Hanabi Learning Environment (HLE) has become the dominant benchmark for ZSC, but recent work has achieved near-perfect inter-seed cross-play performance, limiting its ability to track algorithmic progress. We introduce the Yokai Learning Environment (YLE) - an open-source multi-agent RL benchmark in which effective collaboration requires building common ground by tracking and updating beliefs over moving cards, reasoning under ambiguous hints, and deciding when to terminate the game based on inferred shared knowledge - features absent in the HLE, where beliefs are tied to hand slots and hints are truthful by rule. We evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Environments and Student Outcomes
