Playing games with knowledge: AI-Induced delusions need game theoretic interventions
Will Beaumaster, Paul Schrater

TL;DR
This paper models the problem of AI-induced delusions in conversational agents as a game-theoretic coordination failure and proposes an intervention called Epistemic Mediator with Belief Versioning to improve epistemic safety.
Contribution
It introduces a formal game-theoretic framework for understanding AI-induced delusions and proposes a novel intervention mechanism that significantly reduces false belief spirals in simulations.
Findings
Intervention reduces false belief spirals by 48 times in simulations.
Modeling as a Crawford-Sobel cheap talk game reveals systemic causes of epistemic entrenchment.
Belief Versioning effectively maintains healthy beliefs and enables rollback when needed.
Abstract
Conversational AI has a fundamental flaw as a knowledge interface: sycophantic chatbots induce epistemic entrenchment and delusional belief spirals even in rational agents. We propose the problem does not stem from the AI model, rooted instead in a systemic consequence of the paradigm shift from user-driven knowledge search to users and agents engaged in strategic, repeated-play communication. We formalize the problem as a Crawford-Sobel cheap talk game, where costless user signals induce a pooling equilibrium. Agents optimized for user satisfaction produce sycophantic strategies that provide identical reinforcement across user types with opposite epistemic incentives: exploratory ``Growth-seekers'' () and confirmatory ``Validation-seekers'' (). Under repeated play, this identification failure creates a coordination trap -- analogous to a Prisoner's Dilemma -- where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
