Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy
Zhaoxin Feng, Zheng Chen, Jianfei Ma, Yip Tin Po, Emmanuele Chersoni, Bo Li

TL;DR
This paper investigates how reasoning in large language models influences sycophantic behavior, finding that it can both reduce and mask such tendencies depending on the context and model dynamics.
Contribution
It provides a nuanced analysis of how Chain-of-Thought reasoning affects sycophancy, revealing its dual role in mitigation and masking in LLMs.
Findings
Reasoning generally reduces final sycophancy.
Reasoning can mask sycophancy through deceptive justifications.
Models are more prone to sycophancy in subjective and authority-biased tasks.
Abstract
Alignment techniques often inadvertently induce sycophancy in LLMs. While prior studies studied this behaviour in direct-answer settings, the role of Chain-of-Thought (CoT) reasoning remains under-explored: does it serve as a logical constraint that mitigates sycophancy, or a tool for post-hoc rationalization that masks it? We evaluate a range of models across objective and subjective tasks to investigate the issue. Results show that reasoning generally reduces sycophancy in final decisions but also masks sycophancy in some samples, where models construct deceptive justifications through logical inconsistencies, calculation errors, and one-sided arguments etc. Furthermore, LLMs are more prone to sycophancy in subjective tasks and under authority-bias. Our mechanistic analysis on three open-source models reveals that the tendency of sycophancy is dynamic during the reasoning process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Decision-Making and Behavioral Economics · Free Will and Agency
