Preference-Based Monte Carlo Tree Search
Tobias Joppen, Christian Wirth, and Johannes F\"urnkranz

TL;DR
This paper introduces a preference-based Monte Carlo Tree Search that relies solely on qualitative feedback, enabling its application in domains where numerical rewards are hard to define, and demonstrates comparable performance to traditional MCTS.
Contribution
It proposes a novel MCTS variant that uses only qualitative, preference-based feedback, expanding MCTS applicability to ordinal feedback scenarios.
Findings
Preference-based MCTS performs comparably to traditional MCTS.
Ordinal feedback can be effectively used in MCTS.
The approach broadens MCTS applications to new domains.
Abstract
Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define. Real-time MCTS is a variant which may only rarely encounter states with an explicit, extrinsic reward. To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent. Recent work has shown evidence that in different areas the underlying structure is ordinal and not numerical. Hence erroneous and biased heuristics are inevitable, especially in such domains. In this paper, we propose a MCTS variant which only depends on qualitative feedback, and therefore opens up new applications for MCTS. We also find indications that translating absolute into ordinal feedback may be beneficial. Using a puzzle domain, we show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
