How to Combine Tree-Search Methods in Reinforcement Learning

Yonathan Efroni; Gal Dalal; Bruno Scherrer; Shie Mannor

arXiv:1809.01843·cs.LG·February 19, 2019

How to Combine Tree-Search Methods in Reinforcement Learning

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

PDF

Open Access

TL;DR

This paper proposes a new method for combining tree-search techniques in reinforcement learning that guarantees convergence by backing up values along the optimal path, improving the stability of lookahead policies.

Contribution

It introduces a simple enhancement to existing tree search methods that ensures contraction and convergence, supported by theoretical analysis and convergence rates.

Findings

01

The proposed method guarantees $oldsymbol{ ext{ extgamma}}^h$-contraction.

02

Convergence rates are established for noisy environments.

03

The enhancement improves the stability of lookahead policies.

Abstract

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero). Referring to the planning problem as tree search, a reasonable practice in these implementations is to back up the value only at the leaves while the information obtained at the root is not leveraged other than for updating the policy. Here, we question the potency of this approach. Namely, the latter procedure is non-contractive in general, and its convergence is not guaranteed. Our proposed enhancement is straightforward and simple: use the return from the optimal tree path to back up the values at the descendants of the root. This leads to a $γ^{h}$ -contracting procedure, where $γ$ is the discount factor and $h$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research