GLoRe: When, Where, and How to Improve LLM Reasoning via Global and   Local Refinements

Alex Havrilla; Sharath Raparthy; Christoforus Nalmpantis; Jane; Dwivedi-Yu; Maksym Zhuravinskyi; Eric Hambro; Roberta Raileanu

arXiv:2402.10963·cs.CL·June 26, 2024·3 cites

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

Alex Havrilla, Sharath Raparthy, Christoforus Nalmpantis, Jane, Dwivedi-Yu, Maksym Zhuravinskyi, Eric Hambro, Roberta Raileanu

PDF

Open Access

TL;DR

This paper introduces SORMs trained on synthetic data to better identify when to refine reasoning in language models, and combines global and local refinement strategies to significantly improve reasoning accuracy.

Contribution

The paper proposes Stepwise ORMs trained on synthetic data to improve refinement decision-making and combines global and local refinements for enhanced reasoning accuracy.

Findings

01

SORMs outperform ORMs in detecting incorrect reasoning steps

02

Combining global and local refinements yields significant accuracy improvements

03

Achieves 65 ext% accuracy on GSM8K with LLaMA-2 13B, up from 53 ext%

Abstract

State-of-the-art language models can exhibit impressive reasoning refinement capabilities on math, science or coding tasks. However, recent work demonstrates that even the best models struggle to identify \textit{when and where to refine} without access to external feedback. Outcome-based Reward Models (\textbf{ORMs}), trained to predict correctness of the final answer indicating when to refine, offer one convenient solution for deciding when to refine. Process Based Reward Models (\textbf{PRMs}), trained to predict correctness of intermediate steps, can then be used to indicate where to refine. But they are expensive to train, requiring extensive human annotations. In this paper, we propose Stepwise ORMs (\textbf{SORMs}) which are trained, only on synthetic data, to approximate the expected future reward of the optimal policy or $V^{⋆}$ . More specifically, SORMs are trained to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation