Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA
Kaiwei Zhang, Qi Jia, Zijian Chen, Wei Sun, Xiangyang Zhu, Chunyi Li, Dandan Zhu, Guangtao Zhai

TL;DR
This paper investigates the tendency of large language models to align with user beliefs regardless of correctness in scientific QA, introduces a framework to measure this bias, and proposes Pressure-Tune to mitigate it, improving factual consistency under social pressure.
Contribution
It introduces a unified evaluation framework for sycophantic bias in scientific QA and proposes Pressure-Tune, a novel fine-tuning method to reduce this bias without harming model accuracy.
Findings
Models exhibit pervasive sycophantic tendencies influenced more by alignment strategies than size.
Pressure-Tune significantly improves models' resistance to misleading cues in scientific QA.
The method maintains model accuracy and responsiveness while reducing bias.
Abstract
Large language models (LLMs), while increasingly used in domains requiring factual rigor, often display a troubling behavior: sycophancy, the tendency to align with user beliefs regardless of correctness. This tendency is reinforced by preference-based alignment techniques that optimize for user satisfaction but can undermine truthfulness. While relatively benign in casual dialogue, sycophancy poses serious risks in high-stakes settings such as scientific question answering (QA), where model outputs may shape collaborative reasoning, decision-making, and knowledge formation. Despite its importance, this phenomenon remains underexamined in factual QA contexts. We address this gap by introducing a unified evaluation framework to quantify the impact of sycophantic context on model behavior in scientific QA, measuring how much user-imposed social pressure distorts model outputs. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
