POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with   Non-Asymptotic Analysis

Weichao Mao; Kaiqing Zhang; Qiaomin Xie; Tamer Ba\c{s}ar

arXiv:2006.04672·cs.AI·January 1, 2021·1 cites

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Ba\c{s}ar

PDF

Open Access 1 Video

TL;DR

This paper introduces POLY-HOOT, a novel Monte-Carlo planning algorithm for continuous spaces that combines HOO with polynomial bonuses, providing non-asymptotic convergence guarantees and empirical validation.

Contribution

It presents POLY-HOOT, integrating polynomial bonuses into HOO for continuous MDPs, with theoretical regret bounds and convergence guarantees.

Findings

01

POLY-HOOT achieves polynomial convergence rates.

02

The polynomial bonus improves empirical performance.

03

Theoretical regret bounds are established for non-stationary bandits.

Abstract

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Artificial Intelligence in Games

MethodsMonte-Carlo Tree Search