MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification
Heyuan Huang, Alexandra DeLucia, Vijay Murari Tiyyala, Mark Dredze

TL;DR
MedScore is a domain-adapted, modular pipeline that improves factuality evaluation of free-form medical answers by better decomposing claims and verifying them against in-domain data, reducing hallucinations.
Contribution
It introduces MedScore, a novel, domain-specific factuality evaluation pipeline that enhances claim decomposition and verification for medical answers, outperforming existing methods.
Findings
Extracts up to three times more valid facts than previous methods.
Reduces hallucination and vague references in medical answer evaluation.
Factuality scores vary significantly with different decomposition and verification methods.
Abstract
While Large Language Models (LLMs) can generate fluent and convincing responses, they are not necessarily correct. This is especially apparent in the popular decompose-then-verify factuality evaluation pipeline, where LLMs evaluate generations by decomposing the generations into individual, valid claims. Factuality evaluation is especially important for medical answers, since incorrect medical information could seriously harm the patient. However, existing factuality systems are a poor match for the medical domain, as they are typically only evaluated on objective, entity-centric, formulaic texts such as biographies and historical topics. This differs from condition-dependent, conversational, hypothetical, sentence-structure diverse, and subjective medical answers, which makes decomposition into valid facts challenging. We propose MedScore, a new pipeline to decompose medical answers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Misinformation and Its Impacts
