Tracing the ongoing emergence of human-like reasoning in Large Language Models
Paolo Morosi, Nikoleta Pantelidou, Fritz G\"unther, Elena Pagliarini, Evelina Leivada

TL;DR
This study compares human and large language model reasoning across languages, revealing that LLMs are accurate but lack the pragmatic reasoning aspects that humans naturally employ.
Contribution
It provides a comprehensive analysis of how LLMs perform on pragmatic inferences compared to humans across multiple languages, highlighting their limitations in human-like reasoning.
Findings
Humans use pragmatic inferences in reasoning, which LLMs often ignore.
Some LLMs follow logical truth-tables but lack pragmatic understanding.
LLM accuracy is unaffected by training data type or architecture.
Abstract
Humans effortlessly go beyond literal meanings: If you mow the lawn, I will give you fifty dollars, is typically understood as implying that the speaker will pay only if the lawn is mowed, whereas If you are hungry, there is pizza in the oven implies that pizza is available regardless of the hearers hunger. Large Language Models - LLMs - show human-like performance on many tasks, yet it remains unclear whether they reason like humans. To address this, we conducted a population-matching experiment assessing how twentyfive LLMs compute conditional inferences across four languages, compared to an equal number of humans per language. We find that humans enrich logical reasoning through pragmatic inferences across languages. Model behavior is more variable. Some LLMs perfectly follow the truth-table of conditionals but they ignore pragmatic inferences, while others deviate from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
