Self-rewarding correction for mathematical reasoning

Wei Xiong; Hanning Zhang; Chenlu Ye; Lichang Chen; Nan Jiang; Tong; Zhang

arXiv:2502.19613·cs.AI·February 28, 2025

Self-rewarding correction for mathematical reasoning

Wei Xiong, Hanning Zhang, Chenlu Ye, Lichang Chen, Nan Jiang, Tong, Zhang

PDF

Open Access 2 Repos 3 Datasets

TL;DR

This paper introduces a novel self-rewarding reasoning framework for large language models that enables autonomous error detection, correction, and iterative refinement without external feedback, improving mathematical reasoning performance.

Contribution

The paper proposes a two-stage algorithmic framework for training self-rewarding models using self-generated data, enhancing autonomous reasoning and correction capabilities.

Findings

01

Outperforms intrinsic self-correction methods.

02

Achieves performance comparable to external reward systems.

03

Demonstrates effectiveness on Llama-3 and Qwen-2.5 models.

Abstract

We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback. This integrated approach allows a single model to independently guide its reasoning process, offering computational advantages for model deployment. We particularly focus on the representative task of self-correction, where models autonomously detect errors in their responses, revise outputs, and decide when to terminate iterative refinement loops. To enable this, we propose a two-staged algorithmic framework for constructing self-rewarding reasoning models using only self-generated data. In the first stage, we employ sequential rejection sampling to synthesize long chain-of-thought trajectories that incorporate both self-rewarding and self-correction mechanisms.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications

MethodsFocus