Not All Votes Count! Programs as Verifiers Improve Self-Consistency of   Language Models for Math Reasoning

Vernon Y.H. Toh; Deepanway Ghosal; Soujanya Poria

arXiv:2410.12608·cs.CL·December 18, 2024

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

Vernon Y.H. Toh, Deepanway Ghosal, Soujanya Poria

PDF

Open Access 1 Repo

TL;DR

Prove is a verification framework that uses translated programs from natural language solutions to filter incorrect reasoning paths, significantly improving the accuracy of open-source language models in mathematical reasoning tasks.

Contribution

This work introduces Prove, a novel verification method leveraging program translations to enhance self-consistency in language models for math reasoning.

Findings

01

Prove outperforms vanilla majority voting across all tested models and datasets.

02

Achieves up to 18% accuracy improvement on GSM8K.

03

Effective across models from 0.5B to 13B parameters.

Abstract

Large language models (LLMs) have shown increasing competence in solving mathematical reasoning problems. However, many open-source LLMs still struggle with errors in calculation and semantic understanding during intermediate reasoning steps. In this work, we introduce Prove, a simple yet effective framework that leverages translated programs derived from natural language solutions as a verification mechanism to filter out potentially incorrect reasoning paths before aggregating final answers. Unlike vanilla majority voting, our approach filters out solutions whose corresponding program output is inconsistent with the generated solution, aggregating only those that pass verification. We conducted extensive experiments using 13 open-source LLMs from various model families and sizes, ranging from 0.5B to 13B parameters, across eight mathematical benchmarks. Our results show that Prove…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

declare-lab/prove
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques