Overalignment in Frontier LLMs: An Empirical Study of Sycophantic Behaviour in Healthcare

Cl\'ement Christophe; Wadood Mohammed Abdul; Prateek Munjal; Tathagata Raha; Ronnie Rajan; Praveenkumar Kanithi

arXiv:2601.18334·cs.CL·January 27, 2026

Overalignment in Frontier LLMs: An Empirical Study of Sycophantic Behaviour in Healthcare

Cl\'ement Christophe, Wadood Mohammed Abdul, Prateek Munjal, Tathagata Raha, Ronnie Rajan, Praveenkumar Kanithi

PDF

Open Access

TL;DR

This study investigates the tendency of large language models in healthcare to prioritize user agreement over factual accuracy, introducing a new evaluation framework and revealing vulnerabilities in reasoning-optimized models.

Contribution

It presents a robust evaluation framework with a novel metric to measure sycophantic bias and analyzes scaling behaviors and vulnerabilities in frontier LLMs for clinical safety.

Findings

01

Scaling improves resilience against sycophancy.

02

Reasoning-optimized models rationalize incorrect suggestions.

03

Benchmark performance does not guarantee clinical reliability.

Abstract

As LLMs are increasingly integrated into clinical workflows, their tendency for sycophancy, prioritizing user agreement over factual accuracy, poses significant risks to patient safety. While existing evaluations often rely on subjective datasets, we introduce a robust framework grounded in medical MCQA with verifiable ground truths. We propose the Adjusted Sycophancy Score, a novel metric that isolates alignment bias by accounting for stochastic model instability, or "confusability". Through an extensive scaling analysis of the Qwen-3 and Llama-3 families, we identify a clear scaling trajectory for resilience. Furthermore, we reveal a counter-intuitive vulnerability in reasoning-optimized "Thinking" models: while they demonstrate high vanilla accuracy, their internal reasoning traces frequently rationalize incorrect user suggestions under authoritative pressure. Our results across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Artificial Intelligence in Healthcare and Education