Exploring the Compositional Deficiency of Large Language Models in   Mathematical Reasoning

Jun Zhao; Jingqi Tong; Yurong Mou; Ming Zhang; Qi Zhang; Xuanjing; Huang

arXiv:2405.06680·cs.CL·October 11, 2024

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Jun Zhao, Jingqi Tong, Yurong Mou, Ming Zhang, Qi Zhang, Xuanjing, Huang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the inability of large language models to systematically compose mathematical knowledge with logical traps, revealing a core deficiency in their reasoning capabilities and exploring potential mitigation strategies.

Contribution

The study introduces the MathTrap dataset with logical traps, highlighting the compositional deficiency of LLMs in mathematical reasoning and evaluating methods to improve their systematic compositionality.

Findings

01

LLMs struggle to combine mathematical knowledge with logical traps.

02

Prompting and fine-tuning partially improve LLMs' compositional reasoning.

03

Human-like slow thinking enhances LLMs' ability to handle novel logical cases.

Abstract

Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap} by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K. Since problems with logical flaws are quite rare in the real world, these represent "unseen" cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tongjingqi/MathTrap
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling