MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
Xinnuo Xu, Ond\v{r}ej Du\v{s}ek, Shashi Narayan, Verena Rieser and, Ioannis Konstas

TL;DR
MiRANews introduces a new dataset and benchmarks for multi-resource-assisted news summarization, significantly reducing factual hallucinations by leveraging supplementary documents, thus improving the factual accuracy of summaries.
Contribution
The paper presents MiRANews, a novel dataset for multi-resource-assisted summarization, and demonstrates that using auxiliary documents reduces hallucinations in news summaries.
Findings
Assisted summarization reduces hallucinations by 55%.
Over 27% of facts in gold summaries are better grounded in auxiliary documents.
Models using multiple resources produce more factually accurate summaries.
Abstract
One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate this problem with the help of multiple supplementary resource documents assisting the task. We present a new dataset MiRANews and benchmark existing summarization models. In contrast to multi-document summarization, which addresses multiple events from several source documents, we still aim at generating a summary for a single document. We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
