C-MCTS: Safe Planning with Monte Carlo Tree Search

Dinesh Parthasarathy; Georgios Kontes; Axel Plinge; Christopher; Mutschler

arXiv:2305.16209·cs.LG·October 29, 2024·2 cites

C-MCTS: Safe Planning with Monte Carlo Tree Search

Dinesh Parthasarathy, Georgios Kontes, Axel Plinge, Christopher, Mutschler

PDF

Open Access 1 Repo 3 Reviews

TL;DR

C-MCTS introduces a safety critic trained offline to guide Monte Carlo Tree Search in constrained decision-making, improving safety, reward, and efficiency in safety-critical tasks under model mismatch.

Contribution

It proposes Constrained MCTS (C-MCTS) with a safety critic for better safety and efficiency in constrained planning, addressing high variance issues in previous methods.

Findings

01

C-MCTS satisfies cost constraints while achieving higher rewards.

02

It operates closer to the constraint boundary, improving reward.

03

It is more robust to model mismatch, reducing violations.

Abstract

The Constrained Markov Decision Process (CMDP) formulation allows to solve safety-critical decision making tasks that are subject to constraints. While CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as MCTS for solving them. Previous approaches perform conservatively with respect to costs as they avoid constraint violations by using Monte Carlo cost estimates that suffer from high variance. We propose Constrained MCTS (C-MCTS), which estimates cost using a safety critic that is trained with Temporal Difference learning in an offline phase prior to agent deployment. The critic limits exploration by pruning unsafe trajectories within MCTS during deployment. C-MCTS satisfies cost constraints but operates closer to the constraint boundary, achieving higher rewards than previous work. As…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- C-MCTS achieves improving performance in the quality of the solutions while not violating the cost constraints.

Weaknesses

- The actual running time of the experiments needs to be provided. - Several points need to be clarified in the explanation that is described in the Questions below.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

**Orignality:** While not introducing a completely novel approach, they apply an offline learning technique to estimate costs in CMDPs online. **Significance:** The contributions of the paper lack significance. **Clarity:** The paper is understandable.

Weaknesses

The main drawback of the paper is its lack of significance. The approach introduced is not novel. Learning values/cost estimates offline to be applied online is not a new idea. Nor is the learning approach using a novel technique. I also question the soundness of the analysis. In Prop. 1, the authors claim that at each iteration of their algorithm, they are guaranteed to find the optimal solution. They base their claim on the proof in [Kocsis & Szepesvari, 2006]. However, that work states that

Reviewer 03Rating 3· reject, not good enoughConfidence 5

Strengths

Dealing with safety constraint is probably one of the weaknesses of the RL approaches, and one of the main obstacles for applying RL and planning algorithms like MCTS in real world scenarios. While in many of those scenarios, the algorithms are faced with continuous state/action spaces, tackling the issue in discrete spaces is also important. The extension of the MCTS for constrained MDP seems fairly reasonable.

Weaknesses

While there is some theoretical work included, these do not offer sufficient guarantees for practical applicability. The proposed algorithm could be a step towards an practical application, but it is not there as it is. Given that this is a largely empirical article, the experimental evaluation is rather small. The benchmarks are small and fairly simple, while that set of baselines is also limited.

Code & Models

Repositories

mutschcr/c-mcts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Adversarial Robustness in Machine Learning · AI-based Problem Solving and Planning

MethodsPruning