Large Language Models' Understanding of Math: Source Criticism and Extrapolation
Roozbeh Yousefzadeh, Xuenan Cao

TL;DR
This paper critically examines GPT-4's mathematical understanding, finding it primarily reproduces seen proofs rather than truly grasping concepts, and questions the value of its current approach for theorem proving.
Contribution
The study provides a critical evaluation of GPT-4's mathematical capabilities, highlighting its limitations in understanding and reasoning beyond reproducing seen proofs.
Findings
GPT-4 struggles with problems lacking known formal proofs.
GPT-4's theorem proving ability appears to expand over time.
Reproducing seen proofs is its main strength, not understanding.
Abstract
It has been suggested that large language models such as GPT-4 have acquired some form of understanding beyond the correlations among the words in text including some understanding of mathematics as well. Here, we perform a critical inquiry into this claim by evaluating the mathematical understanding of the GPT-4 model. Considering that GPT-4's training set is a secret, it is not straightforward to evaluate whether the model's correct answers are based on a mathematical understanding or based on replication of proofs that the model has seen before. We specifically craft mathematical questions which their formal proofs are not readily available on the web, proofs that are more likely not seen by the GPT-4. We see that GPT-4 is unable to solve those problems despite their simplicity. It is hard to find scientific evidence suggesting that GPT-4 has acquired an understanding of even basic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Mathematics, Computing, and Information Processing
MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Byte Pair Encoding · Dropout · Adam · Softmax · Label Smoothing
