LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation
Jennifer A Bishop, Qianqian Xie, Sophia Ananiadou

TL;DR
This paper introduces LongDocFACTScore, a new evaluation framework for assessing factual consistency in long document summarisation, addressing limitations of existing metrics and demonstrating improved correlation with human judgments.
Contribution
The paper presents LongDocFACTScore, a novel framework for evaluating factuality in long document summarisation, and introduces LongSciVerify, a human-annotated dataset for this purpose.
Findings
LongDocFACTScore outperforms existing metrics in correlating with human judgments.
The framework efficiently extends to any document length.
The dataset provides detailed factuality annotations for scientific summaries.
Abstract
Maintaining factual consistency is a critical issue in abstractive text summarisation, however, it cannot be assessed by traditional automatic metrics used for evaluating text summarisation, such as ROUGE scoring. Recent efforts have been devoted to developing improved metrics for measuring factual consistency using pre-trained language models, but these metrics have restrictive token limits, and are therefore not suitable for evaluating long document text summarisation. Moreover, there is limited research and resources available for evaluating whether existing automatic evaluation metrics are fit for purpose when applied in long document settings. In this work, we evaluate the efficacy of automatic metrics for assessing the factual consistency of long document text summarisation. We create a human-annotated data set for evaluating automatic factuality metrics, LongSciVerify, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
