From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

Zhe Yu; Wenpeng Xing; Meng Han

arXiv:2604.05348·cs.AI·April 8, 2026

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

Zhe Yu, Wenpeng Xing, Meng Han

PDF

TL;DR

This paper introduces RETINA-SAFE, a retinal evidence benchmark, and ECRT, a white-box framework for risk triage in medical LLMs, significantly improving hallucination detection in diabetic retinopathy decisions.

Contribution

It presents a novel retinal evidence benchmark and a white-box risk triage method that enhances hallucination detection and interpretability in medical LLMs.

Findings

01

ECRT improves Stage-1 balanced accuracy by +0.15 to +0.19 over baselines.

02

ECRT outperforms a single-stage ablation on risk triage accuracy.

03

RETINA-SAFE enables evidence-grounded evaluation in diabetic retinopathy.

Abstract

Hallucinations in medical large language models (LLMs) remain a safety-critical issue, particularly when available evidence is insufficient or conflicting. We study this problem in diabetic retinopathy (DR) decision settings and introduce RETINA-SAFE, an evidence-grounded benchmark aligned with retinal grading records, comprising 12,522 samples. RETINA-SAFE is organized into three evidence-relation tasks: E-Align (evidence-consistent), E-Conflict (evidence-conflicting), and E-Gap (evidence-insufficient). We further propose ECRT (Evidence-Conditioned Risk Triage), a two-stage white-box detection framework: Stage 1 performs Safe/Unsafe risk triage, and Stage 2 refines unsafe cases into contradiction-driven versus evidence-gap risks. ECRT leverages internal representation and logit shifts under CTX/NOCTX conditions, with class-balanced training for robust learning. Under evidence-grouped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.