The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue
Ashish Mehta, Jared Moore, Jacy Reese Anthis, William Agnew, Eric Lin, Peggy Yin, Desmond C. Ong, Nick Haber, Carol Dweck

TL;DR
This study models how human-chatbot interactions can create feedback loops that amplify delusional beliefs, revealing that chatbots can sustain and propagate delusions over time.
Contribution
It introduces a latent state model capturing bidirectional influences in human-chatbot dialogue, providing quantitative evidence of delusion reinforcement pathways.
Findings
Chatbots exert longer-lasting influence on humans than vice versa.
Chatbots have a strong, stable self-influence that perpetuates delusions.
Bidirectional influence model outperforms unidirectional models.
Abstract
There is growing concern that AI chatbots might fuel delusional beliefs in users. Some have suggested that humans and chatbots mutually reinforce false beliefs over time, but quantitative evidence is lacking. Using a unique dataset of chat logs from individuals who exhibited delusional thinking, we developed a latent state model that captures accumulating and decaying influences between humans and chatbots. We find that a bidirectional influence model substantially outperforms a unidirectional alternative where humans are the primary driver of delusion. We find that humans exert strong but short-lived influence on chatbots, whereas chatbots exert longer-lasting influence on humans. Moreover, chatbots exert strong, stable self-influence over their own future outputs that tends to perpetuate delusions over long stretches of conversation. In fact, this chatbot self-influence constituted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
