LongRecall: A Structured Approach for Robust Recall Evaluation in Long-Form Text
MohamamdJavad Ardestani, Ehsan Kamalloo, Davood Rafiei

TL;DR
LongRecall is a structured, multi-stage framework for evaluating recall in long-form text generation, reducing errors and improving accuracy by decomposing answers and verifying facts through lexical, semantic, and entailment checks.
Contribution
It introduces a novel three-stage evaluation framework that enhances recall assessment accuracy in long-form QA by decomposing answers and applying structured verification methods.
Findings
Significantly improves recall accuracy over baseline methods
Reduces false positives and negatives in recall evaluation
Effective across multiple long-form QA benchmarks
Abstract
LongRecall. The completeness of machine-generated text, ensuring that it captures all relevant information, is crucial in domains such as medicine and law and in tasks like list-based question answering (QA), where omissions can have serious consequences. However, existing recall metrics often depend on lexical overlap, leading to errors with unsubstantiated entities and paraphrased answers, while LLM-as-a-Judge methods with long holistic prompts capture broader semantics but remain prone to misalignment and hallucinations without structured verification. We introduce LongRecall, a general three-stage recall evaluation framework that decomposes answers into self-contained facts, successively narrows plausible candidate matches through lexical and semantic filtering, and verifies their alignment through structured entailment checks. This design reduces false positives and false negatives…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
