Overalignment in Frontier LLMs: An Empirical Study of Sycophantic Behaviour in Healthcare
Cl\'ement Christophe, Wadood Mohammed Abdul, Prateek Munjal, Tathagata Raha, Ronnie Rajan, Praveenkumar Kanithi

TL;DR
This study investigates the tendency of large language models in healthcare to prioritize user agreement over factual accuracy, introducing a new evaluation framework and revealing vulnerabilities in reasoning-optimized models.
Contribution
It presents a robust evaluation framework with a novel metric to measure sycophantic bias and analyzes scaling behaviors and vulnerabilities in frontier LLMs for clinical safety.
Findings
Scaling improves resilience against sycophancy.
Reasoning-optimized models rationalize incorrect suggestions.
Benchmark performance does not guarantee clinical reliability.
Abstract
As LLMs are increasingly integrated into clinical workflows, their tendency for sycophancy, prioritizing user agreement over factual accuracy, poses significant risks to patient safety. While existing evaluations often rely on subjective datasets, we introduce a robust framework grounded in medical MCQA with verifiable ground truths. We propose the Adjusted Sycophancy Score, a novel metric that isolates alignment bias by accounting for stochastic model instability, or "confusability". Through an extensive scaling analysis of the Qwen-3 and Llama-3 families, we identify a clear scaling trajectory for resilience. Furthermore, we reveal a counter-intuitive vulnerability in reasoning-optimized "Thinking" models: while they demonstrate high vanilla accuracy, their internal reasoning traces frequently rationalize incorrect user suggestions under authoritative pressure. Our results across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Artificial Intelligence in Healthcare and Education
