S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

Yuchen Yan; Jin Jiang; Yang Liu; Yixin Cao; Xin Xu; Mengdi Zhang; Xunliang Cai; Jian Shao

arXiv:2409.01524·cs.CL·July 1, 2025

S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

Yuchen Yan, Jin Jiang, Yang Liu, Yixin Cao, Xin Xu, Mengdi Zhang, Xunliang Cai, Jian Shao

PDF

Open Access 1 Video

TL;DR

This paper introduces S^3cMath, a novel approach enabling large language models to spontaneously detect and correct errors during mathematical reasoning, significantly improving their accuracy and reliability.

Contribution

It presents the first method for spontaneous step-level self-correction in LLMs, enhancing their mathematical reasoning capabilities through a new training strategy and data construction.

Findings

01

Significant improvements on GSM8K and MATH benchmarks.

02

Effective across various foundation LLMs.

03

First demonstration of spontaneous self-correction in mathematical reasoning.

Abstract

Self-correction is a novel method that can stimulate the potential reasoning abilities of large language models (LLMs). It involves detecting and correcting errors during the inference process when LLMs solve reasoning problems. However, recent works do not regard self-correction as a spontaneous and intrinsic capability of LLMs. Instead, such correction is achieved through post-hoc generation, external knowledge introduction, multi-model collaboration, and similar techniques. In this paper, we propose a series of mathematical LLMs called S $^{3}$ c-Math, which are able to perform Spontaneous Step-level Self-correction for Mathematical reasoning. This capability helps LLMs to recognize whether their ongoing inference tends to contain errors and simultaneously correct these errors to produce a more reliable response. We proposed a method, which employs a step-level sampling approach to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners· underline

Taxonomy

TopicsTopic Modeling · Mathematics, Computing, and Information Processing · AI-based Problem Solving and Planning