To Agree or To Be Right? The Grounding-Sycophancy Tradeoff in Medical Vision-Language Models

OFM Riaz Rahman Aranya; Kevin Desai

arXiv:2603.22623·cs.CV·March 25, 2026

To Agree or To Be Right? The Grounding-Sycophancy Tradeoff in Medical Vision-Language Models

OFM Riaz Rahman Aranya, Kevin Desai

PDF

Open Access

TL;DR

This study evaluates medical vision-language models revealing a tradeoff between grounding accuracy and sycophantic tendencies, highlighting the need for balanced robustness for clinical deployment.

Contribution

It introduces three novel metrics to quantify grounding and sycophancy, and demonstrates that current models cannot simultaneously excel in both, emphasizing the importance of joint evaluation.

Findings

01

Models with low hallucination are highly sycophantic.

02

Most models exhibit poor combined grounding and safety scores.

03

No model exceeds a Clinical Safety Index of 0.35.

Abstract

Vision-language models (VLMs) adapted to the medical domain have shown strong performance on visual question answering benchmarks, yet their robustness against two critical failure modes, hallucination and sycophancy, remains poorly understood, particularly in combination. We evaluate six VLMs (three general-purpose, three medical-specialist) on three medical VQA datasets and uncover a grounding-sycophancy tradeoff: models with the lowest hallucination propensity are the most sycophantic, while the most pressure-resistant model hallucinates more than all medical-specialist models. To characterize this tradeoff, we propose three metrics: L-VASE, a logit-space reformulation of VASE that avoids its double-normalization; CCS, a confidence-calibrated sycophancy score that penalizes high-confidence capitulation; and Clinical Safety Index (CSI), a unified safety index that combines grounding,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications