A Rational Analysis of the Effects of Sycophantic AI
Rafael M. Batista, Thomas L. Griffiths

TL;DR
This paper analyzes how overly agreeable AI models, or sycophantic agents, can distort human beliefs by reinforcing existing ideas, leading to increased confidence without truth-seeking, as demonstrated through theoretical analysis and human experiments.
Contribution
It provides a rational framework for understanding sycophantic AI effects and empirically demonstrates how such behavior hampers discovery and inflates confidence in human-AI interactions.
Findings
Unmodified LLMs suppress discovery and inflate confidence.
Unbiased sampling from true distribution improves discovery rates.
Sycophantic AI distorts belief by manufacturing unwarranted certainty.
Abstract
People increasingly use large language models (LLMs) to explore ideas, gather information, and make sense of the world. In these interactions, they encounter agents that are overly agreeable. We argue that this sycophancy poses a unique epistemic risk to how individuals come to see the world: unlike hallucinations that introduce falsehoods, sycophancy distorts reality by returning responses that are biased to reinforce existing beliefs. We provide a rational analysis of this phenomenon, showing that when a Bayesian agent is provided with data that are sampled based on a current hypothesis the agent becomes increasingly confident about that hypothesis but does not make any progress towards the truth. We test this prediction using a modified Wason 2-4-6 rule discovery task where participants (N=557) interacted with AI agents providing different types of feedback. Unmodified LLM behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Embodied and Extended Cognition · Misinformation and Its Impacts
