VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs
Yurui Xiang, Xingyi Mao, Rui Sheng, Zixin Chen, Zelin Zang, Yuyang Wu, Haipeng Zeng, Huamin Qu, Yushi Sun, Yanna Lin

TL;DR
VeriLLMed is a visual analytics system designed to help developers debug medical large language models by integrating biomedical knowledge graphs to identify and analyze recurring diagnostic errors.
Contribution
The paper introduces VeriLLMed, a novel interactive system that leverages external knowledge graphs to facilitate debugging of medical LLMs, addressing interpretability and error pattern identification.
Findings
VeriLLMed enables identification of relation, branch, and missing errors in medical LLMs.
Expert evaluation shows VeriLLMed improves error detection and understanding.
Case studies demonstrate VeriLLMed's effectiveness in clinical error analysis.
Abstract
Large language models (LLMs) show promise in medical diagnosis, but real-world deployment remains challenging due to high-stakes clinical decisions and imperfect reasoning reliability. As a result, careful inspection of model behavior is essential for assessing whether diagnostic reasoning is reliable and clinically grounded. However, debugging medical LLMs remains difficult. First, developers often lack sufficient medical domain expertise to interpret model errors in clinically meaningful terms. Second, models can fail across a large and diverse set of instances involving different input types, tasks, and reasoning steps, making it challenging for developers to prioritize which errors deserve focused inspection. Third, developers struggle to identify recurring error patterns across cases, as existing debugging practices are largely instance-centric and rely on manual inspection of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
