Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice

Savan Doshi

arXiv:2602.07319·cs.CL·March 2, 2026

Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice

Savan Doshi

PDF

Open Access

TL;DR

This paper introduces a risk-sensitive evaluation method for medical language models that assesses potential harm from hallucinated content, emphasizing the importance of impact over factual correctness.

Contribution

It proposes a novel framework that quantifies hallucinations based on risk-related language, moving beyond traditional correctness metrics to better evaluate clinical safety.

Findings

01

Models differ significantly in risk profiles despite similar accuracy.

02

Standard metrics do not capture high-risk hallucinations.

03

Risk-sensitive evaluation reveals safety concerns overlooked by traditional methods.

Abstract

Large language models are increasingly being used in patient-facing medical question answering, where hallucinated outputs can vary widely in potential harm. However, existing hallucination standards and evaluation metrics focus primarily on factual correctness, treating all errors as equally severe. This obscures clinically relevant failure modes, particularly when models generate unsupported but actionable medical language. We propose a risk-sensitive evaluation framework that quantifies hallucinations through the presence of risk-bearing language, including treatment directives, contraindications, urgency cues, and mentions of high-risk medications. Rather than assessing clinical correctness, our approach evaluates the potential impact of hallucinated content if acted upon. We further combine risk scoring with a relevance measure to identify high-risk, low-grounding failures. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Neurobiology of Language and Bilingualism · Healthcare Decision-Making and Restraints