Multi-XScience: A Large-scale Dataset for Extreme Multi-document   Summarization of Scientific Articles

Yao Lu; Yue Dong; Laurent Charlin

arXiv:2010.14235·cs.CL·October 28, 2020

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Yao Lu, Yue Dong, Laurent Charlin

PDF

1 Repo 1 Models 2 Datasets

TL;DR

Multi-XScience is a large-scale dataset designed for extreme multi-document summarization of scientific articles, specifically for generating related-work sections from abstracts and references, facilitating advanced abstractive modeling.

Contribution

The paper introduces Multi-XScience, a novel large-scale dataset for multi-document summarization in scientific literature, focusing on related-work section generation from multiple sources.

Findings

01

State-of-the-art models perform well on the dataset

02

Dataset favors abstractive summarization approaches

03

Empirical results demonstrate dataset's suitability for advanced models

Abstract

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results---using several state-of-the-art models trained on the Multi-XScience dataset---reveal that Multi-XScience is well suited for abstractive models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaolu/Multi-XScience
noneOfficial

Models

🤗
OctaSpace/Mistral7B-fintuned-multi_x_science
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.