VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

Yurui Xiang; Xingyi Mao; Rui Sheng; Zixin Chen; Zelin Zang; Yuyang Wu; Haipeng Zeng; Huamin Qu; Yushi Sun; Yanna Lin

arXiv:2604.23356·cs.CL·April 28, 2026

VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

Yurui Xiang, Xingyi Mao, Rui Sheng, Zixin Chen, Zelin Zang, Yuyang Wu, Haipeng Zeng, Huamin Qu, Yushi Sun, Yanna Lin

PDF

TL;DR

VeriLLMed is a visual analytics system designed to help developers debug medical large language models by integrating biomedical knowledge graphs to identify and analyze recurring diagnostic errors.

Contribution

The paper introduces VeriLLMed, a novel interactive system that leverages external knowledge graphs to facilitate debugging of medical LLMs, addressing interpretability and error pattern identification.

Findings

01

VeriLLMed enables identification of relation, branch, and missing errors in medical LLMs.

02

Expert evaluation shows VeriLLMed improves error detection and understanding.

03

Case studies demonstrate VeriLLMed's effectiveness in clinical error analysis.

Abstract

Large language models (LLMs) show promise in medical diagnosis, but real-world deployment remains challenging due to high-stakes clinical decisions and imperfect reasoning reliability. As a result, careful inspection of model behavior is essential for assessing whether diagnostic reasoning is reliable and clinically grounded. However, debugging medical LLMs remains difficult. First, developers often lack sufficient medical domain expertise to interpret model errors in clinically meaningful terms. Second, models can fail across a large and diverse set of instances involving different input types, tasks, and reasoning steps, making it challenging for developers to prioritize which errors deserve focused inspection. Third, developers struggle to identify recurring error patterns across cases, as existing debugging practices are largely instance-centric and rely on manual inspection of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.