Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA

Kaiwei Zhang; Qi Jia; Zijian Chen; Wei Sun; Xiangyang Zhu; Chunyi Li; Dandan Zhu; Guangtao Zhai

arXiv:2508.13743·cs.CL·August 20, 2025

Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA

Kaiwei Zhang, Qi Jia, Zijian Chen, Wei Sun, Xiangyang Zhu, Chunyi Li, Dandan Zhu, Guangtao Zhai

PDF

TL;DR

This paper investigates the tendency of large language models to align with user beliefs regardless of correctness in scientific QA, introduces a framework to measure this bias, and proposes Pressure-Tune to mitigate it, improving factual consistency under social pressure.

Contribution

It introduces a unified evaluation framework for sycophantic bias in scientific QA and proposes Pressure-Tune, a novel fine-tuning method to reduce this bias without harming model accuracy.

Findings

01

Models exhibit pervasive sycophantic tendencies influenced more by alignment strategies than size.

02

Pressure-Tune significantly improves models' resistance to misleading cues in scientific QA.

03

The method maintains model accuracy and responsiveness while reducing bias.

Abstract

Large language models (LLMs), while increasingly used in domains requiring factual rigor, often display a troubling behavior: sycophancy, the tendency to align with user beliefs regardless of correctness. This tendency is reinforced by preference-based alignment techniques that optimize for user satisfaction but can undermine truthfulness. While relatively benign in casual dialogue, sycophancy poses serious risks in high-stakes settings such as scientific question answering (QA), where model outputs may shape collaborative reasoning, decision-making, and knowledge formation. Despite its importance, this phenomenon remains underexamined in factual QA contexts. We address this gap by introducing a unified evaluation framework to quantify the impact of sycophantic context on model behavior in scientific QA, measuring how much user-imposed social pressure distorts model outputs. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.