Do LLMs have core beliefs?
Anna Sokol, Marianna B. Ganapini, Nitesh V. Chawla

TL;DR
This paper investigates whether large language models possess stable core beliefs akin to human cognition, finding that despite improvements, they still lack this fundamental aspect.
Contribution
It introduces a probing framework called Adversarial Dialogue Trees to assess the stability of LLMs' core commitments across multiple domains.
Findings
Most LLMs fail to maintain stable worldviews under pressure.
Recent models show improved argumentative stability but still lack core commitments.
All current models lack a key component of human cognition.
Abstract
The rise of Large Language Models (LLMs) has sparked debate about whether these systems exhibit human-level cognition. In this debate, little attention has been paid to a structural component of human cognition: core beliefs, truths that provide a foundation around which we can build a worldview. These commitments usually resist debunking, as abandoning them would represent a fundamental shift in how we see reality. In this paper, we ask whether LLMs hold anything akin to core commitments. Using a probing framework we call Adversarial Dialogue Trees (ADTs) over five domains (science, history, geography, biology, and mathematics), we find that most LLMs fail to maintain a stable worldview. Though some recent models showed improved stability, they still eventually failed to maintain key commitments under conversational pressure. These results document an improvement in argumentative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
