Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Yunxiang Zhang; Muhammad Khalifa; Lajanugen Logeswaran; Jaekyeom Kim,; Moontae Lee; Honglak Lee; Lu Wang

arXiv:2404.17140·cs.CL·June 7, 2024

Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim,, Moontae Lee, Honglak Lee, Lu Wang

PDF

Open Access 1 Repo 8 Models 1 Video

TL;DR

This paper investigates how small language models can improve their reasoning accuracy through self-correction, using a novel training pipeline with critiques and a strong verifier, leading to notable performance gains.

Contribution

It introduces a new method for training small LMs to self-correct reasoning errors using self-generated critiques and a strong verifier, enhancing their reasoning capabilities.

Findings

01

Improved self-correction abilities on multiple reasoning datasets.

02

Significant gains when paired with a strong GPT-4 verifier.

03

Limitations observed with weak self-verifiers.

Abstract

Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether small (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yunx-z/score.github.io
noneOfficial

Models

Videos

Small Language Models Need Strong Verifiers to Self-Correct Reasoning· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques