From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems
A M Muntasir Rahman, Junyi Ye, Wei Yao, Sierra S. Liu, Jesse Yu,, Jonathan Yu, Wenpeng Yin, Guiling Wang

TL;DR
This paper introduces FaultyMath, a benchmark dataset to evaluate whether large language models can identify logical flaws in mathematical problems, revealing that most models act as blind solvers rather than logical thinkers.
Contribution
The paper presents a new diverse dataset and comprehensive evaluation framework to assess LLMs' ability to detect logical inconsistencies in math problems, highlighting current limitations.
Findings
Most LLMs act as blind solvers without deeper reasoning.
Models struggle to reliably detect faulty math problems.
Hints and explanations have limited impact on improving model reasoning.
Abstract
Consider the math problem: "Lily received 3 cookies from her best friend yesterday and ate 5 for breakfast. Today, her friend gave her 3 more cookies. How many cookies does Lily have now?" Many large language models (LLMs) in previous research approach this problem by calculating the answer "1" using the equation "3 - 5 + 3." However, from a human perspective, we recognize the inherent flaw in this problem: Lily cannot eat 5 cookies if she initially only had 3. This discrepancy prompts a key question: Are current LLMs merely Blind Solver that apply mathematical operations without deeper reasoning, or can they function as Logical Thinker capable of identifying logical inconsistencies? To explore this question, we propose a benchmark dataset, FaultyMath, which includes faulty math problems of rich diversity: i) multiple mathematical categories, e.g., algebra, geometry, number theory,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Statistics Education and Methodologies
