Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles
Yao Lu, Yue Dong, Laurent Charlin

TL;DR
Multi-XScience is a large-scale dataset designed for extreme multi-document summarization of scientific articles, specifically for generating related-work sections from abstracts and references, facilitating advanced abstractive modeling.
Contribution
The paper introduces Multi-XScience, a novel large-scale dataset for multi-document summarization in scientific literature, focusing on related-work section generation from multiple sources.
Findings
State-of-the-art models perform well on the dataset
Dataset favors abstractive summarization approaches
Empirical results demonstrate dataset's suitability for advanced models
Abstract
Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results---using several state-of-the-art models trained on the Multi-XScience dataset---reveal that Multi-XScience is well suited for abstractive models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
