Interpretable Contrastive Monte Carlo Tree Search Reasoning
Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei, Liu, Xuming Hu, Lijie Wen

TL;DR
This paper introduces SC-MCTS*, an improved Monte Carlo Tree Search algorithm for large language models that enhances reasoning accuracy and speed through interpretability and optimized components.
Contribution
The paper presents a novel, interpretable reward model and optimized strategies for MCTS, significantly boosting reasoning performance and efficiency in LLMs.
Findings
51.9% average speed improvement per node
17.4% performance increase on Blocksworld dataset
Enhanced interpretability and component analysis of MCTS
Abstract
We propose SC-MCTS*: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for Large Language Models (LLMs), significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- I like the overall methodology of reconsidering different components of MCTS and improving them. - Achieved empirical performance is impressive, especially compared to o1.
- The setup and requirements are not explained well. The introduction focuses on MCTS and its drawbacks but does not clearly explain the authors' goal, objective, and setup. Only in line 148 do they mention that the focus is on using existing LLMs to achieve better reasoning, but this is also not clearly presented. I would appreciate if the goal were explained in the introduction (along with a high-level setup and an example of a known expert). Additionally, in section 3.1, the authors could exp
The authors introduce several improvements to previous MCTS methods, and verify these improvements via an ablation study. In particular, their method of combining three reward signals and adaptively weighting them is interesting and (to my knowledge) novel. They demonstrate a clear improvement on the Blocksworld dataset against RAP-MCTS as well as CoT.
**Methodology** The authors only evaluate their method on the Blocksworld dataset. Showing results on other reasoning datasets such as GSM-8k, even if the experiment is limited in scope, would help show that the method generalizes to different types of tasks. The authors could provide more detail as to how they chose hyperparameters, picked the reward clusters, etc. In my opinion, the claim that the model is interpretable is not sufficiently motivated. In particular, I do not see how their o
The introduction of a reward model based on contrastive decoding, which emphasizes action-level evaluation, enhances interpretability and robustness.
* The impact of the evaluation of intermediate nodes on MCTS performance is significant but not discussed in depth. This oversight may lead to an incomplete understanding of the method's effectiveness. * The novelty of applying MCTS to planning is somewhat diminished by the fact that this approach is already well-established in the literature. The paper would benefit from a more thorough comparison with existing methodologies to highlight its contributions.
- Shows significant improvement compared to the baseline - Accelerates the search process by using speculative decoding
- The article is unclear, especially in section 4 where the authors fail to describe how CONTRASTIVE DECODING and SPECULATIVE DECODING are applied in MCTS. - Using MCTS in LLM reasoning is not a novel approach, as numerous papers have already discussed its application to LLMs. This paper doesn't present any innovative ideas to enhance reasoning capabilities. Although the authors claim speculative decoding as one of their contributions, this inference acceleration paradigm was already mentioned i
see question
see question
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Time Series Analysis and Forecasting · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
