Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games
Vojt\v{e}ch Kova\v{r}\'ik, Viliam Lis\'y

TL;DR
This paper investigates the effectiveness of Hannan consistent algorithms within Monte Carlo Tree Search for simultaneous move games, revealing that direct application may not guarantee convergence to Nash equilibrium without additional averaging or properties.
Contribution
It introduces conditions under which Hannan consistent algorithms guarantee convergence in simultaneous move games and analyzes their empirical performance.
Findings
Direct HC application does not guarantee convergence.
Averaging over joint actions ensures convergence but is slower.
Common HC algorithms possess properties that guarantee convergence without averaging.
Abstract
Hannan consistency, or no external regret, is a~key concept for learning in games. An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the~best fixed action in hindsight. If both players in a~zero-sum normal form game use a~Hannan consistent algorithm, their average behavior converges to a~Nash equilibrium (NE) of the~game. A similar result is known about extensive form games, but the~played strategies need to be Hannan consistent with respect to the~counterfactual values, which are often difficult to obtain. We study zero-sum extensive form games with simultaneous moves, but otherwise perfect information. These games generalize normal form games and they are a special case of extensive form games. We study whether applying HC algorithms in each decision point of these games directly to the~observed payoffs leads to convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
