GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane, Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Raileanu

TL;DR
This paper introduces SORMs trained on synthetic data to better identify when to refine reasoning in language models, and combines global and local refinement strategies to significantly improve reasoning accuracy.
Contribution
The paper proposes Stepwise ORMs trained on synthetic data to improve refinement decision-making and combines global and local refinements for enhanced reasoning accuracy.
Findings
SORMs outperform ORMs in detecting incorrect reasoning steps
Combining global and local refinements yields significant accuracy improvements
Achieves 65 ext% accuracy on GSM8K with LLaMA-2 13B, up from 53 ext%
Abstract
State-of-the-art language models can exhibit impressive reasoning refinement capabilities on math, science or coding tasks. However, recent work demonstrates that even the best models struggle to identify \textit{when and where to refine} without access to external feedback. Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, offer one convenient solution for deciding when to refine. Process Based Reward Models (\textbf{PRMs}), trained to predict correctness of intermediate steps, can then be used to indicate where to refine. But they are expensive to train, requiring extensive human annotations. In this paper, we propose Stepwise ORMs (\textbf{SORMs}) which are trained, only on synthetic data, to approximate the expected future reward of the optimal policy or . More specifically, SORMs are trained to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation
