MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News   Summarization

Xinnuo Xu; Ond\v{r}ej Du\v{s}ek; Shashi Narayan; Verena Rieser and; Ioannis Konstas

arXiv:2109.10650·cs.CL·September 23, 2021

MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Xinnuo Xu, Ond\v{r}ej Du\v{s}ek, Shashi Narayan, Verena Rieser and, Ioannis Konstas

PDF

Open Access 1 Repo

TL;DR

MiRANews introduces a new dataset and benchmarks for multi-resource-assisted news summarization, significantly reducing factual hallucinations by leveraging supplementary documents, thus improving the factual accuracy of summaries.

Contribution

The paper presents MiRANews, a novel dataset for multi-resource-assisted summarization, and demonstrates that using auxiliary documents reduces hallucinations in news summaries.

Findings

01

Assisted summarization reduces hallucinations by 55%.

02

Over 27% of facts in gold summaries are better grounded in auxiliary documents.

03

Models using multiple resources produce more factually accurate summaries.

Abstract

One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate this problem with the help of multiple supplementary resource documents assisting the task. We present a new dataset MiRANews and benchmark existing summarization models. In contrast to multi-document summarization, which addresses multiple events from several source documents, we still aim at generating a summary for a single document. We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xinnuoxu/miranews
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies