Tree Search for Simultaneous Move Games via Equilibrium Approximation

Ryan Yu; Alex Olshevsky; Peter Chin

arXiv:2406.10411·cs.MA·June 18, 2024

Tree Search for Simultaneous Move Games via Equilibrium Approximation

Ryan Yu, Alex Olshevsky, Peter Chin

PDF

Open Access 4 Reviews

TL;DR

This paper introduces a tree search method that approximates equilibrium strategies for simultaneous move games, improving performance on benchmarks like Google Research Football and Starcraft.

Contribution

It adapts perfect information game tree search algorithms to simultaneous move games using equilibrium approximation, achieving superior results.

Findings

01

Outperforms current MARL algorithms on multiple benchmarks

02

Effective in cooperative, competitive, and mixed environments

03

Provides a practical equilibrium approximation method for tree search

Abstract

Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on partial information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of partial information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent's move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft. In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance? We answer this question by deriving a practical method…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 8Confidence 3

Strengths

- The related work is strong, and the paper is reasonably well motivated - The outline of the methodology is clear - The motivation for using the simulation depth of 1 is strong - The results are overall very strong, however I would have liked to see some more of the PSRO variants that focus on other aspects (e.g. population diversity) that may be stronger performing baselines than the more standard PSRO and jPSRO, especially in larger scale environments. However, the general strong performance

Weaknesses

- It is a little difficult to follow the first part of section 4.1 - e.g. the writing suggests the value network only takes joint actions as inputs, I assume it also takes the state? The equations 1 through 5 could also use a bit more explanation. - I am not sure about the argument made that the PSRO methods are not designed to work in large environments - e.g. Towards Unifying Behavioural and Response Diversity for Open-ended Learning in Zero-Sum games (Liu et al. 2021) applied a diversity aw

Reviewer 02Rating 3Confidence 4

Strengths

There was a lot of effort put into designing the algorithm, running many experiments, and writing the paper.

Weaknesses

The paper has several weaknesses which prevent me from recommending it for acceptance. I think the actual description of the algorithm is unclear. At inference time, what is the tree-search algorithm? Appendix A.5 suggests that there will be a game tree and an MCTS-like algorithm (as does the title of the paper), but Section 4 suggests that the method is no-regret algorithms using 0-step lookahead (just q-values)? The experiments performed also don't test the core hypothesis of the paper -- th

Reviewer 03Rating 5Confidence 4

Strengths

1. The proposed depth-limited scheme is well-suited for simultaneous-move games. 2. Extensive experiments validate the method across various scenarios. 3. The method allows for better parallelization, making it more practical for real-world games.

Weaknesses

1. The paper's writing and formatting are poor, with some images, tables, and equations arranged in a cluttered, two-column layout (e.g., Figures 1 and 4, Tables 1 and 2, and Equations 1-5), making the document look disorganized. Additionally, citation formatting is incorrect; in the ICLR template, \citep should be used instead of \citet when authors or publications are not part of the sentence. 2. The paper lacks theoretical support. While it spends much space arguing that CCE is superior to mi

Reviewer 04Rating 3Confidence 5

Strengths

The presentation is clear.

Weaknesses

The technical contribution is not sound.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Applications · Artificial Intelligence in Games · Game Theory and Voting Systems