What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct
Meryl Ye, Lujain Ibrahim, Jessica Y. Bo, Myra Cheng, Ida Mattsson, Daniel Vennemeyer, Robert Kraut, Steve Rathje

TL;DR
This paper develops a taxonomy of AI sycophancy behaviors and surveys experts to understand the lack of consensus on what constitutes sycophantic actions in AI systems.
Contribution
It introduces a taxonomy to categorize AI sycophancy and presents survey results revealing expert disagreement on specific behaviors.
Findings
Current research focuses mainly on overt sycophancy towards beliefs.
Experts agree sycophancy is a problem but disagree on which behaviors are sycophantic.
The taxonomy helps clarify different types of sycophantic behaviors and their measurement challenges.
Abstract
AI sycophancy has become a prominent concern in large language model (LLM) research. Yet the term lacks a consistent definition and has been applied to behaviors ranging from agreeing with a user's false claim to excessively praising the user to withholding corrective feedback. When researchers, companies, and policymakers use the same term to describe different behaviors, evaluation results become difficult to compare, mitigation strategies fail to transfer, and systems that are resistant to one form of sycophancy continue exhibiting other forms. To address this, we make two contributions. First, we reviewed 70 papers on AI sycophancy to develop a taxonomy of how the behavior has been defined and measured. The taxonomy distinguishes (1) whether a model is sycophantic toward a user's positions and beliefs, or toward the user's broader personal traits and emotions, and (2) whether this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
