Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
Jun Zhao, Jingqi Tong, Yurong Mou, Ming Zhang, Qi Zhang, Xuanjing, Huang

TL;DR
This paper investigates the inability of large language models to systematically compose mathematical knowledge with logical traps, revealing a core deficiency in their reasoning capabilities and exploring potential mitigation strategies.
Contribution
The study introduces the MathTrap dataset with logical traps, highlighting the compositional deficiency of LLMs in mathematical reasoning and evaluating methods to improve their systematic compositionality.
Findings
LLMs struggle to combine mathematical knowledge with logical traps.
Prompting and fine-tuning partially improve LLMs' compositional reasoning.
Human-like slow thinking enhances LLMs' ability to handle novel logical cases.
Abstract
Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap} by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K. Since problems with logical flaws are quite rare in the real world, these represent "unseen" cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling
