Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases
Risako Ando, Takanobu Morishita, Hirohiko Abe, Koji Mineshima,, Mitsuhiro Okada

TL;DR
This study assesses whether large language models exhibit human-like biases in syllogistic reasoning using the NeuBAROCO dataset, revealing that models struggle more with belief biases, conversion errors, and atmosphere effects.
Contribution
Introduces NeuBAROCO, a bilingual syllogistic reasoning dataset, and evaluates LLMs' biases in logical inference, highlighting their limitations in human-like reasoning biases.
Findings
LLMs struggle with belief biases in syllogistic reasoning.
Models exhibit difficulty with conversion errors.
Performance drops on problems involving atmosphere effects.
Abstract
This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans. Specifically, we focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction. To facilitate our analysis, we introduce a dataset called NeuBAROCO, originally designed for psychological experiments that assess human logical abilities in syllogistic reasoning. The dataset consists of syllogistic inferences in both English and Japanese. We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects. Our findings demonstrate that current large language models struggle more with problems involving these three types of biases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsFocus
