TL;DR
This paper introduces two search techniques to enhance cooperative policies in partially observable games, demonstrating significant performance improvements and achieving state-of-the-art results in Hanabi.
Contribution
It proposes single-agent and multi-agent search methods that improve cooperative policies with theoretical guarantees and practical success in Hanabi.
Findings
Greatly improved Hanabi scores with the search techniques.
Achieved a new state-of-the-art score of 24.61/25 in Hanabi.
The methods guarantee at least the original policy performance.
Abstract
Recent superhuman results in games have largely been achieved in a variety of zero-sum settings, such as Go and Poker, in which agents need to compete against others. However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well. These settings commonly require participants to both interpret the actions of others and to act in a way that is informative when being interpreted. Those abilities are typically summarized as theory f mind and are seen as crucial for social interactions. In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
