LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie,, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang,, Dongzhan Zhou

TL;DR
LLaMA-Berry introduces a novel framework combining Monte Carlo Tree Search, Self-Refine, and pairwise reward modeling to significantly improve mathematical reasoning in Large Language Models, especially for complex Olympiad problems.
Contribution
The paper presents a new optimization framework that integrates pairwise reward models with MCTS and Self-Refine, enhancing reasoning efficiency and accuracy in LLMs for advanced mathematical tasks.
Findings
Outperforms existing methods like ToT and rStar on Olympiad benchmarks.
Achieves higher problem-solving accuracy and search efficiency.
Effective in complex and diverse mathematical reasoning tasks.
Abstract
This paper presents an advanced mathematical problem-solving framework, LLaMA-Berry, for enhancing the mathematical reasoning ability of Large Language Models (LLMs). The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critic and rewriting capabilities of LLMs, Self-Refine applied to MCTS (SR-MCTS) overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms by fostering a more efficient exploration of solution spaces. Pairwise Preference Reward Model~(PPRM), inspired by Reinforcement Learning from Human Feedback (RLHF), is then used to model pairwise preferences between solutions, utilizing an Enhanced Borda Count (EBC) method to synthesize these preferences into a global ranking score to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗di-zhang-fdu/PPRM-gemma-2-2b-itmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗di-zhang-fdu/OpenLongCoT-Base-Gemma2-2Bmodel· 6 dl· ♡ 86 dl♡ 8
- 🤗c01zaut/OpenLongCoT-Base-Gemma2-2B-rk3588-1.1.1model· 1 dl1 dl
- 🤗c01zaut/OpenLongCoT-Base-Gemma2-2B-rk3588-1.1.2model· 1 dl· ♡ 11 dl♡ 1
- 🤗SimpleBerry/LLaMA-O1-Supervised-1129model· 9 dl· ♡ 239 dl♡ 23
- 🤗QuantFactory/LLaMA-O1-Supervised-1129-GGUFmodel· 28 dl· ♡ 228 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
