Joint Policy Search for Multi-agent Collaboration with Imperfect Information
Yuandong Tian, Qucheng Gong, Tina Jiang

TL;DR
This paper introduces Joint Policy Search (JPS), a novel method for improving multi-agent collaboration in imperfect information games, demonstrating superior performance in both theoretical and real-world settings like Contract Bridge.
Contribution
The paper proposes JPS, a new algorithm that decomposes game value changes into localized policy updates, enabling effective joint policy improvement without full re-evaluation.
Findings
JPS guarantees non-worsening of performance on tabular games.
JPS outperforms existing algorithms like BAD in collaborative settings.
JPS achieves state-of-the-art results in Contract Bridge, surpassing championship software.
Abstract
To learn good joint policies for multi-agent collaboration with imperfect information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches (optimizing one agent's policy at a time, e.g., self-play) work with guarantees, in multi-agent cooperative setting they often converge to sub-optimal Nash equilibrium. On the other hand, directly modeling joint policy changes in imperfect information game is nontrivial due to complicated interplay of policies (e.g., upstream updates affect downstream state reachability). In this paper, we show global changes of game values can be decomposed to policy changes localized at each information set, with a novel term named policy-change density. Based on this, we propose Joint Policy Search(JPS) that iteratively improves joint policies of collaborative agents in imperfect information games, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Multi-Agent Systems and Negotiation
