Look-ahead Search on Top of Policy Networks in Imperfect Information Games
Ondrej Kubicek, Neil Burch, Viliam Lisy

TL;DR
This paper introduces a method to incorporate look-ahead search into policy-gradient reinforcement learning algorithms for imperfect information games, enhancing decision-making without increasing training complexity.
Contribution
The paper proposes a scalable test-time search method using a critic network to estimate values for imperfect information games, improving performance without additional training during search.
Findings
Enhanced performance in Leduc hold'em and Goofspiel
Scalable approach suitable for large games
Effective integration of search with policy networks
Abstract
Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies
