PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection
Siyuan Cheng, Bozhong Tian, YanChao Hao, Zheng Wei

TL;DR
PRISM-MCTS introduces a metacognitive reasoning framework that enhances search efficiency by sharing insights across trajectories, leading to improved performance on reasoning benchmarks with fewer resources.
Contribution
It proposes a novel framework combining shared memory and a process reward model to improve reasoning efficiency and effectiveness over traditional MCTS approaches.
Findings
Halves trajectory requirements on GPQA benchmark.
Outperforms MCTS-RAG and Search-o1 in reasoning tasks.
Achieves high-fidelity evaluation with few-shot training.
Abstract
PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection Siyuan Cheng, Bozhong Tian, Yanchao Hao, Zheng Wei Published: 06 Apr 2026, Last Modified: 06 Apr 2026 ACL 2026 Findings Conference, Area Chairs, Reviewers, Publication Chairs, Authors Revisions BibTeX CC BY 4.0 Keywords: Efficient/Low-Resource Methods for NLP, Generation, Question Answering Abstract: The emergence of reasoning models, exemplified by OpenAI o1, signifies a transition from intuitive to deliberative cognition, effectively reorienting the scaling laws from pre-training paradigms toward test-time computation. While Monte Carlo Tree Search (MCTS) has shown promise in this domain, existing approaches typically treat each rollout as an isolated trajectory. This lack of information sharing leads to severe inefficiency and substantial computational redundancy, as the search process fails to leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
