LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive   Summarisation

Jennifer A Bishop; Qianqian Xie; Sophia Ananiadou

arXiv:2309.12455·cs.CL·May 29, 2024

LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation

Jennifer A Bishop, Qianqian Xie, Sophia Ananiadou

PDF

Open Access 1 Repo

TL;DR

This paper introduces LongDocFACTScore, a new evaluation framework for assessing factual consistency in long document summarisation, addressing limitations of existing metrics and demonstrating improved correlation with human judgments.

Contribution

The paper presents LongDocFACTScore, a novel framework for evaluating factuality in long document summarisation, and introduces LongSciVerify, a human-annotated dataset for this purpose.

Findings

01

LongDocFACTScore outperforms existing metrics in correlating with human judgments.

02

The framework efficiently extends to any document length.

03

The dataset provides detailed factuality annotations for scientific summaries.

Abstract

Maintaining factual consistency is a critical issue in abstractive text summarisation, however, it cannot be assessed by traditional automatic metrics used for evaluating text summarisation, such as ROUGE scoring. Recent efforts have been devoted to developing improved metrics for measuring factual consistency using pre-trained language models, but these metrics have restrictive token limits, and are therefore not suitable for evaluating long document text summarisation. Moreover, there is limited research and resources available for evaluating whether existing automatic evaluation metrics are fit for purpose when applied in long document settings. In this work, we evaluate the efficacy of automatic metrics for assessing the factual consistency of long document text summarisation. We create a human-annotated data set for evaluating automatic factuality metrics, LongSciVerify, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jbshp/longdocfactscore
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies