Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing

Kaishuai Xu; Tiezheng Yu; Wenjun Hou; Yi Cheng; Chak Tou Leong; Liangyou Li; Xin Jiang; Lifeng Shang; Qun Liu; Wenjie Li

arXiv:2410.06638·cs.CL·May 28, 2025

Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing

Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

PDF

Open Access 1 Video

TL;DR

This paper introduces RISE, a novel preference learning framework that injects subtle errors into LLM solutions to improve their mathematical and logical reasoning accuracy without extensive annotations.

Contribution

RISE is a new method that uses self-edited solutions with injected errors for training, enhancing reasoning accuracy more efficiently than existing preference learning approaches.

Findings

01

Improves GSM8K accuracy by 3.0%

02

Enhances MATH accuracy by 7.9%

03

Effective across reasoning and code generation tasks

Abstract

Large Language Models (LLMs) have exhibited strong mathematical reasoning prowess, tackling tasks ranging from basic arithmetic to advanced competition-level problems. However, frequently occurring subtle yet critical errors, such as miscalculations or incorrect substitutions, limit the LLMs' full potential. Existing studies to improve mathematical ability typically involve applying preference learning to step-wise solution pairs. Although these methods leverage samples of varying granularity to mitigate reasoning errors, they overlook critical subtle errors. In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into pivotal tokens in reasoning or computation steps to construct hard pairs for error mitigation. In detail, RISE uses the LLM itself to edit a small number of tokens in the solution,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing· underline

Taxonomy

TopicsSoftware Engineering Research · Receptor Mechanisms and Signaling · Pharmacological Effects and Assays

MethodsDirect Preference Optimization · Focus