Weak-to-Strong Reasoning

Yuqing Yang; Yan Ma; Pengfei Liu

arXiv:2407.13647·cs.CL·October 2, 2024

Weak-to-Strong Reasoning

Yuqing Yang, Yan Ma, Pengfei Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a progressive, self-refining training framework for large language models that improves reasoning abilities by autonomously selecting high-quality training data, eliminating the need for external supervision or human annotations.

Contribution

The proposed method enables strong models to self-improve reasoning skills through autonomous data refinement, advancing weak-to-strong learning without external supervision.

Findings

01

Significant performance improvements on GSM8K and MATH datasets.

02

Effective supervision of large models by smaller models in challenging reasoning tasks.

03

Validation of the approach in a forward-looking setup with Llama models.

Abstract

When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervision for these models. Weak-to-strong learning, which leverages a less capable model to unlock the latent abilities of a stronger model, proves valuable in this context. Yet, the efficacy of this approach for complex reasoning tasks is still untested. Furthermore, tackling reasoning tasks under the weak-to-strong setting currently lacks efficient methods to avoid blindly imitating the weak supervisor including its errors. In this paper, we introduce a progressive learning framework that enables the strong model to autonomously refine its training data, without requiring input from either a more advanced model or human-annotated data. This framework begins with supervised fine-tuning on a selective small but high-quality dataset, followed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gair-nlp/weak-to-strong-reasoning
pytorchOfficial

Videos

Weak-to-Strong Reasoning· underline

Taxonomy

TopicsLogic, Reasoning, and Knowledge