From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs
Zhe Yu, Wenpeng Xing, Meng Han

TL;DR
This paper introduces RETINA-SAFE, a retinal evidence benchmark, and ECRT, a white-box framework for risk triage in medical LLMs, significantly improving hallucination detection in diabetic retinopathy decisions.
Contribution
It presents a novel retinal evidence benchmark and a white-box risk triage method that enhances hallucination detection and interpretability in medical LLMs.
Findings
ECRT improves Stage-1 balanced accuracy by +0.15 to +0.19 over baselines.
ECRT outperforms a single-stage ablation on risk triage accuracy.
RETINA-SAFE enables evidence-grounded evaluation in diabetic retinopathy.
Abstract
Hallucinations in medical large language models (LLMs) remain a safety-critical issue, particularly when available evidence is insufficient or conflicting. We study this problem in diabetic retinopathy (DR) decision settings and introduce RETINA-SAFE, an evidence-grounded benchmark aligned with retinal grading records, comprising 12,522 samples. RETINA-SAFE is organized into three evidence-relation tasks: E-Align (evidence-consistent), E-Conflict (evidence-conflicting), and E-Gap (evidence-insufficient). We further propose ECRT (Evidence-Conditioned Risk Triage), a two-stage white-box detection framework: Stage 1 performs Safe/Unsafe risk triage, and Stage 2 refines unsafe cases into contradiction-driven versus evidence-gap risks. ECRT leverages internal representation and logit shifts under CTX/NOCTX conditions, with class-balanced training for robust learning. Under evidence-grouped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
