Look-ahead Search on Top of Policy Networks in Imperfect Information   Games

Ondrej Kubicek; Neil Burch; Viliam Lisy

arXiv:2312.15220·cs.GT·January 30, 2025·1 cites

Look-ahead Search on Top of Policy Networks in Imperfect Information Games

Ondrej Kubicek, Neil Burch, Viliam Lisy

PDF

Open Access

TL;DR

This paper introduces a method to incorporate look-ahead search into policy-gradient reinforcement learning algorithms for imperfect information games, enhancing decision-making without increasing training complexity.

Contribution

The paper proposes a scalable test-time search method using a critic network to estimate values for imperfect information games, improving performance without additional training during search.

Findings

01

Enhanced performance in Leduc hold'em and Goofspiel

02

Scalable approach suitable for large games

03

Effective integration of search with policy networks

Abstract

Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories. Besides the policy network, the algorithm trains an additional critic network, which estimates the expected values of players following various transformations of the policies given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPeer-to-Peer Network Technologies