Improve Value Estimation of Q Function and Reshape Reward with Monte   Carlo Tree Search

Jiamian Li

arXiv:2410.11642·cs.LG·October 24, 2024

Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search

Jiamian Li

PDF

Open Access

TL;DR

This paper introduces a novel Monte Carlo Tree Search-based method to improve Q value estimation and reward shaping in imperfect information games, demonstrated on Uno, leading to better performance than traditional approaches.

Contribution

The paper proposes a new algorithm combining Monte Carlo Tree Search with Q learning to reduce overestimation and reshape rewards in imperfect information games.

Findings

01

Our method outperforms traditional algorithms in Uno.

02

Performance gains increase with more players, indicating higher difficulty.

03

The approach is generalizable to other Q value estimation algorithms.

Abstract

Reinforcement learning has achieved remarkable success in perfect information games such as Go and Atari, enabling agents to compete at the highest levels against human players. However, research in reinforcement learning for imperfect information games has been relatively limited due to the more complex game structures and randomness. Traditional methods face challenges in training and improving performance in imperfect information games due to issues like inaccurate Q value estimation and reward sparsity. In this paper, we focus on Uno, an imperfect information game, and aim to address these problems by reducing Q value overestimation and reshaping reward function. We propose a novel algorithm that utilizes Monte Carlo Tree Search to average the value estimations in Q function. Even though we choose Double Deep Q Learning as the foundational framework in this paper, our method can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization

MethodsFocus