Ordinal Monte Carlo Tree Search

Tobias Joppen; Johannes F\"urnkranz

arXiv:2101.10670·cs.AI·January 27, 2021

Ordinal Monte Carlo Tree Search

Tobias Joppen, Johannes F\"urnkranz

PDF

TL;DR

This paper introduces an ordinal approach to Monte Carlo Tree Search (MCTS) that addresses reward bias issues in domains with only ordinal state rankings, demonstrating its superiority over traditional MCTS variants in game playing.

Contribution

The paper proposes a novel ordinal MCTS algorithm and a new bandit algorithm, improving decision-making in environments with ordinal rewards and showing its effectiveness in game playing.

Findings

01

Ordinal MCTS outperforms traditional MCTS variants in experiments.

02

The new bandit algorithm demonstrates better performance than UCB.

03

Ordinal treatment reduces reward bias in state ranking scenarios.

Abstract

In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals one and losing equals minus one, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values is not available, a numerical reward signal is necessarily biased. In this paper we take a look at MCTS, a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.