Convex Hull Monte-Carlo Tree Search

Michael Painter; Bruno Lacerda; Nick Hawes

arXiv:2003.04445·cs.AI·March 24, 2020·1 cites

Convex Hull Monte-Carlo Tree Search

Michael Painter, Bruno Lacerda, Nick Hawes

PDF

Open Access

TL;DR

This paper introduces Zooming CHMCTS, a novel multi-objective Monte-Carlo Tree Search method that leverages convex hulls and contextual bandits to improve planning in stochastic environments with multiple goals.

Contribution

The paper proposes the Convex Hull Monte-Carlo Tree Search framework and integrates contextual bandit techniques for efficient multi-objective planning in large environments.

Findings

01

Zooming CHMCTS achieves sublinear contextual regret.

02

It scales better than CHVI under computational constraints.

03

Demonstrated effectiveness in the Generalised Deep Sea Treasure environment.

Abstract

This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multiobjective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning