Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective   Reinforcement Learning

Conor F. Hayes; Mathieu Reymond; Diederik M. Roijers; Enda; Howley; Patrick Mannion

arXiv:2211.13032·cs.AI·December 7, 2022

Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning

Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda, Howley, Patrick Mannion

PDF

Open Access

TL;DR

This paper introduces two novel Monte Carlo tree search algorithms designed for risk-aware and multi-objective reinforcement learning, effectively handling the distribution of returns rather than just their expected value.

Contribution

The paper proposes NLU-MCTS and DMCTS algorithms that optimize policies for nonlinear utility functions and approximate return distributions, advancing risk-aware and multi-objective RL.

Findings

01

Both algorithms outperform state-of-the-art methods in expected utility optimization.

02

DMCTS effectively uses Thompson sampling for risk-aware decision making.

03

The methods are applicable in single-execution scenarios like medical treatments.

Abstract

In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics