Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Santiago Onta\~n\'on

arXiv:1710.04805·cs.AI·October 16, 2017

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Santiago Onta\~n\'on

PDF

Open Access

TL;DR

This paper introduces a sampling strategy called na"ive sampling based on combinatorial multi-armed bandits to improve Monte Carlo Tree Search in real-time strategy games with large branching factors.

Contribution

It provides a theoretical analysis of na"ive sampling variants and demonstrates their effectiveness in RTS games with large branching factors.

Findings

01

Na"ive sampling outperforms other strategies as branching factor increases.

02

Theoretical properties of na"ive sampling variants are analyzed.

03

Empirical results show improved performance in RTS game scenarios.

Abstract

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called {\em na\"{i}ve sampling}, based on a variant of the Multi-armed Bandit problem called {\em Combinatorial Multi-armed Bandits} (CMAB). We analyze the theoretical properties of several variants of {\em na\"{i}ve sampling}, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, {\em na\"{i}ve sampling} outperforms the other sampling strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Artificial Intelligence in Games · Reinforcement Learning in Robotics