Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing
Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

TL;DR
This paper introduces RISE, a novel preference learning framework that injects subtle errors into LLM solutions to improve their mathematical and logical reasoning accuracy without extensive annotations.
Contribution
RISE is a new method that uses self-edited solutions with injected errors for training, enhancing reasoning accuracy more efficiently than existing preference learning approaches.
Findings
Improves GSM8K accuracy by 3.0%
Enhances MATH accuracy by 7.9%
Effective across reasoning and code generation tasks
Abstract
Large Language Models (LLMs) have exhibited strong mathematical reasoning prowess, tackling tasks ranging from basic arithmetic to advanced competition-level problems. However, frequently occurring subtle yet critical errors, such as miscalculations or incorrect substitutions, limit the LLMs' full potential. Existing studies to improve mathematical ability typically involve applying preference learning to step-wise solution pairs. Although these methods leverage samples of varying granularity to mitigate reasoning errors, they overlook critical subtle errors. In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into pivotal tokens in reasoning or computation steps to construct hard pairs for error mitigation. In detail, RISE uses the LLM itself to edit a small number of tokens in the solution,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Engineering Research · Receptor Mechanisms and Signaling · Pharmacological Effects and Assays
MethodsDirect Preference Optimization · Focus
