Interpretable Contrastive Monte Carlo Tree Search Reasoning

Zitian Gao; Boye Niu; Xuzheng He; Haotian Xu; Hongzhang Liu; Aiwei; Liu; Xuming Hu; Lijie Wen

arXiv:2410.01707·cs.CL·December 30, 2024

Interpretable Contrastive Monte Carlo Tree Search Reasoning

Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei, Liu, Xuming Hu, Lijie Wen

PDF

Open Access 1 Repo 5 Reviews

TL;DR

This paper introduces SC-MCTS*, an improved Monte Carlo Tree Search algorithm for large language models that enhances reasoning accuracy and speed through interpretability and optimized components.

Contribution

The paper presents a novel, interpretable reward model and optimized strategies for MCTS, significantly boosting reasoning performance and efficiency in LLMs.

Findings

01

51.9% average speed improvement per node

02

17.4% performance increase on Blocksworld dataset

03

Enhanced interpretability and component analysis of MCTS

Abstract

We propose SC-MCTS*: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 3

Strengths

- I like the overall methodology of reconsidering different components of MCTS and improving them. - Achieved empirical performance is impressive, especially compared to o1.

Weaknesses

- The setup and requirements are not explained well. The introduction focuses on MCTS and its drawbacks but does not clearly explain the authors' goal, objective, and setup. Only in line 148 do they mention that the focus is on using existing LLMs to achieve better reasoning, but this is also not clearly presented. I would appreciate if the goal were explained in the introduction (along with a high-level setup and an example of a known expert). Additionally, in section 3.1, the authors could exp

Reviewer 02Rating 6Confidence 2

Strengths

The authors introduce several improvements to previous MCTS methods, and verify these improvements via an ablation study. In particular, their method of combining three reward signals and adaptively weighting them is interesting and (to my knowledge) novel. They demonstrate a clear improvement on the Blocksworld dataset against RAP-MCTS as well as CoT.

Weaknesses

**Methodology** The authors only evaluate their method on the Blocksworld dataset. Showing results on other reasoning datasets such as GSM-8k, even if the experiment is limited in scope, would help show that the method generalizes to different types of tasks. The authors could provide more detail as to how they chose hyperparameters, picked the reward clusters, etc. In my opinion, the claim that the model is interpretable is not sufficiently motivated. In particular, I do not see how their o

Reviewer 03Rating 3Confidence 3

Strengths

The introduction of a reward model based on contrastive decoding, which emphasizes action-level evaluation, enhances interpretability and robustness.

Weaknesses

* The impact of the evaluation of intermediate nodes on MCTS performance is significant but not discussed in depth. This oversight may lead to an incomplete understanding of the method's effectiveness. * The novelty of applying MCTS to planning is somewhat diminished by the fact that this approach is already well-established in the literature. The paper would benefit from a more thorough comparison with existing methodologies to highlight its contributions.

Reviewer 04Rating 3Confidence 4

Strengths

- Shows significant improvement compared to the baseline - Accelerates the search process by using speculative decoding

Weaknesses

- The article is unclear, especially in section 4 where the authors fail to describe how CONTRASTIVE DECODING and SPECULATIVE DECODING are applied in MCTS. - Using MCTS in LLM reasoning is not a novel approach, as numerous papers have already discussed its application to LLMs. This paper doesn't present any innovative ideas to enhance reasoning capabilities. Although the authors claim speculative decoding as one of their contributions, this inference acceleration paradigm was already mentioned i

Reviewer 05Rating 8Confidence 3

Strengths

see question

Weaknesses

see question

Code & Models

Repositories

zitian-gao/sc-mcts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Time Series Analysis and Forecasting · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings