# Ordinal Monte Carlo Tree Search

**Authors:** Tobias Joppen, Johannes F\"urnkranz

arXiv: 1901.04274 · 2020-12-09

## TL;DR

This paper introduces an ordinal approach to Monte Carlo Tree Search that improves performance in domains where only the relative ranking of states is meaningful, avoiding biases introduced by numerical rewards.

## Contribution

The paper proposes an ordinal treatment of rewards in MCTS, addressing reward bias issues and demonstrating superior performance over existing MCTS variants in game playing.

## Key findings

- Ordinal MCTS outperforms preference-based and vanilla MCTS.
- Ordinal approach reduces reward bias in state evaluation.
- Experimental results confirm the effectiveness of ordinal rewards in MCTS.

## Abstract

In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show a dominance of our newly proposed ordinal MCTS algorithm over preference-based MCTS, vanilla MCTS and various other MCTS variants.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.04274/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1901.04274/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/1901.04274/full.md

---
Source: https://tomesphere.com/paper/1901.04274