Ask don't tell: Reducing sycophancy in large language models
Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau

TL;DR
This paper investigates the causes of sycophantic responses in large language models and proposes input framing strategies, like converting non-questions into questions, to effectively reduce such bias.
Contribution
It systematically analyzes how input framing influences sycophancy and introduces practical mitigation techniques that outperform simple prompts.
Findings
Sycophancy is higher in responses to non-questions than questions.
Sycophancy increases with conveyed epistemic certainty.
Converting non-questions into questions reduces sycophancy more effectively than baseline prompts.
Abstract
Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where we first isolate how input framing influences sycophancy, and second, leverage these findings to develop mitigation strategies. In a nested factorial design, we compare questions to various non-questions where we vary three orthogonal factors: epistemic certainty (statement, belief, conviction), perspective (I- vs user-perspective), and affirmation vs negation. We show that (1) sycophancy is substantially higher in response to non-questions compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
