VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic   Health Records

Philip Chung; Akshay Swaminathan; Alex J. Goodell; Yeasul Kim; S.; Momsen Reincke; Lichy Han; Ben Deverett; Mohammad Amin Sadeghi; Abdel-Badih; Ariss; Marc Ghanem; David Seong; Andrew A. Lee; Caitlin E. Coombes; Brad; Bradshaw; Mahir A. Sufian; Hyo Jung Hong; Teresa P. Nguyen; Mohammad R.; Rasouli; Komal Kamra; Mark A. Burbridge; James C. McAvoy; Roya Saffary,; Stephen P. Ma; Dev Dash; James Xie; Ellen Y. Wang; Clifford A. Schmiesing,; Nigam Shah; Nima Aghaeepour

arXiv:2501.16672·cs.AI·January 29, 2025·2 cites

VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

Philip Chung, Akshay Swaminathan, Alex J. Goodell, Yeasul Kim, S., Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel-Badih, Ariss, Marc Ghanem, David Seong, Andrew A. Lee, Caitlin E. Coombes, Brad, Bradshaw, Mahir A. Sufian, Hyo Jung Hong, Teresa P. Nguyen

PDF

Open Access 1 Repo

TL;DR

VeriFact is an AI system that verifies the factual accuracy of LLM-generated clinical text against electronic health records, outperforming clinicians in fact-checking accuracy and facilitating EHR-based language model applications.

Contribution

Introduces VeriFact, combining retrieval-augmented generation and LLM-as-a-Judge, along with VeriFact-BHC dataset for evaluating fact verification in clinical texts.

Findings

01

VeriFact achieves 92.7% agreement with human ground truth.

02

VeriFact exceeds clinicians' agreement levels in fact-checking.

03

System can accelerate development of LLM-based EHR applications.

Abstract

Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinican ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

philipchung/verifact
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training