Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships
D. Panas, S. Seth, V. Belle

TL;DR
This paper investigates whether large language models can genuinely reason about arithmetical relationships, finding that their capabilities are limited to statistical inference and do not reflect true reasoning skills.
Contribution
The study introduces a simple probing setup to evaluate LLMs' reasoning about implicit knowledge, revealing their limitations in handling combinatorial and arithmetical reasoning.
Findings
LLMs improve in knowledge and pseudo-reasoning but remain limited to statistical inference.
Pure statistical learning struggles with combinatorial and arithmetical reasoning tasks.
Increasing model size does not necessarily enhance genuine reasoning abilities.
Abstract
Two major areas of interest in the era of Large Language Models regard questions of what do LLMs know, and if and how they may be able to reason (or rather, approximately reason). Since to date these lines of work progressed largely in parallel (with notable exceptions), we are interested in investigating the intersection: probing for reasoning about the implicitly-held knowledge. Suspecting the performance to be lacking in this area, we use a very simple set-up of comparisons between cardinalities associated with elements of various subjects (e.g. the number of legs a bird has versus the number of wheels on a tricycle). We empirically demonstrate that although LLMs make steady progress in knowledge acquisition and (pseudo)reasoning with each new GPT release, their capabilities are limited to statistical inference only. It is difficult to argue that pure statistical learning can cope…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Dense Connections · Adam · Layer Normalization · Attention Dropout · Multi-Head Attention
