Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems
Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du

TL;DR
This paper introduces DUP, a method that enhances LLMs' ability to solve math word problems by promoting deep understanding, leading to state-of-the-art accuracy on GSM8K and other benchmarks.
Contribution
The paper presents DUP, a novel approach focusing on reducing semantic misunderstanding errors to significantly improve LLMs' math reasoning performance.
Findings
DUP achieves 97.1% accuracy on GSM8K in zero-shot setting.
DUP outperforms existing methods across 10 diverse reasoning benchmarks.
Deep understanding of problems is key to improving LLM reasoning.
Abstract
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the reasoning performance of LLMs. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Digital Rights Management and Security · Vehicle License Plate Recognition
