Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs

Shasha Zhou; Mingyu Huang; Jack Cole; Charles Britton; Ming Yin; Jan Wolber; Ke Li

arXiv:2511.12817·cs.LG·December 22, 2025

Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs

Shasha Zhou, Mingyu Huang, Jack Cole, Charles Britton, Ming Yin, Jan Wolber, Ke Li

PDF

Open Access 1 Video

TL;DR

This paper explores using medical knowledge graphs to automatically evaluate the factual accuracy of responses generated by large language models in healthcare, aiming to improve reliability and trustworthiness.

Contribution

The paper introduces FAITH, a novel KG-based framework for factuality evaluation of medical LLM responses without needing reference answers.

Findings

01

KG-grounded evaluation correlates better with clinician judgments

02

Effective in distinguishing LLM capabilities

03

Robust to textual variances

Abstract

The recent proliferation of large language models (LLMs) holds the potential to revolutionize healthcare, with strong capabilities in diverse medical tasks. Yet, deploying LLMs in high-stakes healthcare settings requires rigorous verification and validation to understand any potential harm. This paper investigates the reliability and viability of using medical knowledge graphs (KGs) for the automated factuality evaluation of LLM-generated responses. To ground this investigation, we introduce FAITH, a framework designed to systematically probe the strengths and limitations of this KG-based approach. FAITH operates without reference answers by decomposing responses into atomic claims, linking them to a medical KG, and scoring them based on evidence paths. Experiments on diverse medical tasks with human subjective evaluations demonstrate that KG-grounded evaluation achieves considerably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs· underline

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education