Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov   Decision Processes

Tetsuro Morimura; Kazuhiro Ota; Kenshi Abe; Peinan Zhang

arXiv:2206.01011·cs.LG·July 8, 2024

Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes

Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang

PDF

Open Access

TL;DR

This paper introduces Monte Carlo Tree Learning (MCTL), an adaptation of MCTS for online reinforcement learning, and combines it with policy gradient methods to improve learning in non-Markov decision processes, validated by numerical experiments.

Contribution

It proposes MCTL for online RL, combines it with policy gradient algorithms, and provides convergence analysis and empirical validation.

Findings

01

MCTL effectively adapts MCTS for online RL environments.

02

The combined PG and MCTL approach converges under certain conditions.

03

Numerical experiments demonstrate the method's effectiveness.

Abstract

Policy gradient (PG) is a reinforcement learning (RL) approach that optimizes a parameterized policy model for an expected return using gradient ascent. While PG can work well even in non-Markovian environments, it may encounter plateaus or peakiness issues. As another successful RL approach, algorithms based on Monte Carlo Tree Search (MCTS), which include AlphaZero, have obtained groundbreaking results, especially in the game-playing domain. They are also effective when applied to non-Markov decision processes. However, the standard MCTS is a method for decision-time planning, which differs from the online RL setting. In this work, we first introduce Monte Carlo Tree Learning (MCTL), an adaptation of MCTS for online RL setups. We then explore a combined policy approach of PG and MCTL to leverage their strengths. We derive conditions for asymptotic convergence with the results of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance

MethodsAlphaZero · Monte-Carlo Tree Search