How to Combine Tree-Search Methods in Reinforcement Learning
Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

TL;DR
This paper proposes a new method for combining tree-search techniques in reinforcement learning that guarantees convergence by backing up values along the optimal path, improving the stability of lookahead policies.
Contribution
It introduces a simple enhancement to existing tree search methods that ensures contraction and convergence, supported by theoretical analysis and convergence rates.
Findings
The proposed method guarantees $oldsymbol{ ext{ extgamma}}^h$-contraction.
Convergence rates are established for noisy environments.
The enhancement improves the stability of lookahead policies.
Abstract
Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero). Referring to the planning problem as tree search, a reasonable practice in these implementations is to back up the value only at the leaves while the information obtained at the root is not leveraged other than for updating the policy. Here, we question the potency of this approach. Namely, the latter procedure is non-contractive in general, and its convergence is not guaranteed. Our proposed enhancement is straightforward and simple: use the return from the optimal tree path to back up the values at the descendants of the root. This leads to a -contracting procedure, where is the discount factor and is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
