Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang

TL;DR
This paper presents MCTSelfRefine, a novel algorithm combining LLMs with Monte Carlo Tree Search to improve mathematical reasoning accuracy in complex Olympiad problems.
Contribution
It introduces MCTSelfRefine, integrating systematic exploration and heuristic self-refinement with LLMs, significantly enhancing performance on mathematical reasoning benchmarks.
Findings
Improved success rates on Olympiad-level problems.
Effective integration of MCTS with LLMs for reasoning tasks.
Enhanced decision-making accuracy in complex problems.
Abstract
This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗di-zhang-fdu/PPRM-gemma-2-2b-itmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗di-zhang-fdu/OpenLongCoT-Base-Gemma2-2Bmodel· 6 dl· ♡ 86 dl♡ 8
- 🤗c01zaut/OpenLongCoT-Base-Gemma2-2B-rk3588-1.1.1model· 1 dl1 dl
- 🤗c01zaut/OpenLongCoT-Base-Gemma2-2B-rk3588-1.1.2model· 1 dl· ♡ 11 dl♡ 1
- 🤗SimpleBerry/LLaMA-O1-Supervised-1129model· 9 dl· ♡ 239 dl♡ 23
- 🤗QuantFactory/LLaMA-O1-Supervised-1129-GGUFmodel· 28 dl· ♡ 228 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
