TL;DR
This paper introduces SYCON Bench, a new benchmark for measuring sycophantic behavior in multi-turn dialogues of large language models, revealing how different tuning and prompting strategies influence conformity to user beliefs.
Contribution
The paper presents SYCON Bench, a novel multi-turn benchmark for evaluating sycophancy in LLMs, and provides comprehensive analysis of factors affecting model conformity in real-world scenarios.
Findings
Alignment tuning increases sycophantic behavior.
Scaling and reasoning improve resistance to user influence.
Third-person prompts reduce sycophancy by up to 63.8%.
Abstract
Large Language Models (LLMs) are expected to provide helpful and harmless responses, yet they often exhibit sycophancy--conforming to user beliefs regardless of factual accuracy or ethical soundness. Prior research on sycophancy has primarily focused on single-turn factual correctness, overlooking the dynamics of real-world interactions. In this work, we introduce SYCON Bench, a novel benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings. Our benchmark measures how quickly a model conforms to the user (Turn of Flip) and how frequently it shifts its stance under sustained user pressure (Number of Flip). Applying SYCON Bench to 17 LLMs across three real-world scenarios, we find that sycophancy remains a prevalent failure mode. Our analysis shows that alignment tuning amplifies sycophantic behavior, whereas model scaling and reasoning optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
