Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas, Icard, Dan Jurafsky, James Zou

TL;DR
This paper systematically evaluates the epistemic reasoning capabilities of modern language models, revealing significant limitations in understanding belief, knowledge, and truth, which are critical for reliable decision-making in sensitive fields.
Contribution
It introduces a new dataset, KaBLE, and provides a comprehensive analysis of LMs' epistemic reasoning, highlighting key weaknesses and biases in current models.
Findings
LMs perform well on factual scenarios but poorly on false and belief-related tasks.
They struggle with recognizing and affirming personal beliefs, especially when contradictory to facts.
A bias exists in processing first-person versus third-person beliefs, with better performance on third-person tasks.
Abstract
As language models (LMs) become integral to fields like healthcare, law, and journalism, their ability to differentiate between fact, belief, and knowledge is essential for reliable decision-making. Failure to grasp these distinctions can lead to significant consequences in areas such as medical diagnosis, legal judgments, and dissemination of fake news. Despite this, current literature has largely focused on more complex issues such as theory of mind, overlooking more fundamental epistemic challenges. This study systematically evaluates the epistemic reasoning capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE, consisting of 13,000 questions across 13 tasks. Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios, particularly in belief-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Softmax
