VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
Philip Chung, Akshay Swaminathan, Alex J. Goodell, Yeasul Kim, S., Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel-Badih, Ariss, Marc Ghanem, David Seong, Andrew A. Lee, Caitlin E. Coombes, Brad, Bradshaw, Mahir A. Sufian, Hyo Jung Hong, Teresa P. Nguyen

TL;DR
VeriFact is an AI system that verifies the factual accuracy of LLM-generated clinical text against electronic health records, outperforming clinicians in fact-checking accuracy and facilitating EHR-based language model applications.
Contribution
Introduces VeriFact, combining retrieval-augmented generation and LLM-as-a-Judge, along with VeriFact-BHC dataset for evaluating fact verification in clinical texts.
Findings
VeriFact achieves 92.7% agreement with human ground truth.
VeriFact exceeds clinicians' agreement levels in fact-checking.
System can accelerate development of LLM-based EHR applications.
Abstract
Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinican ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
