VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
Xin Liu, Lechen Zhang, Sheza Munir, Yiyang Gu, Lu Wang

TL;DR
VeriFact is a new framework that improves the evaluation of long-form responses from language models by better extracting and verifying facts, supported by a novel benchmark that measures both precision and recall.
Contribution
The paper introduces VeriFact, a framework for improved factuality evaluation, and FactRBench, a benchmark that assesses both precision and recall in long-form model responses.
Findings
VeriFact enhances fact completeness and preserves relational facts.
Larger models improve both precision and recall, but high precision doesn't always mean high recall.
FactRBench enables comprehensive evaluation of factuality in long-form responses.
Abstract
Large language models (LLMs) excel at generating long-form responses, but evaluating their factuality remains challenging due to complex inter-sentence dependencies within the generated facts. Prior solutions predominantly follow a decompose-decontextualize-verify pipeline but often fail to capture essential context and miss key relational facts. In this paper, we introduce VeriFact, a factuality evaluation framework designed to enhance fact extraction by identifying and resolving incomplete and missing facts to support more accurate verification results. Moreover, we introduce FactRBench , a benchmark that evaluates both precision and recall in long-form model responses, whereas prior work primarily focuses on precision. FactRBench provides reference fact sets from advanced LLMs and human-written answers, enabling recall assessment. Empirical evaluations show that VeriFact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
