Data-driven Summarization of Scientific Articles
Nikola I. Nikolov, Michael Pfeiffer, Richard H.R. Hahnloser

TL;DR
This paper introduces new datasets derived from scientific articles for training and evaluating multi-sentence summarization models, demonstrating their suitability and providing benchmarks for long-sequence summarization tasks.
Contribution
It creates two large-scale multi-sentence summarization datasets from scientific articles and evaluates existing neural models on these datasets, highlighting their potential.
Findings
Scientific articles are suitable for data-driven summarization.
Existing models perform well on scientific article datasets.
The datasets serve as benchmarks for long-sequence summarization.
Abstract
Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles. Such models are typically trained on input-summary pairs consisting of only a single or a few sentences, partially due to limited availability of multi-sentence training data. Here, we propose to use scientific articles as a new milestone for text summarization: large-scale training data come almost for free with two types of high-quality summaries at different levels - the title and the abstract. We generate two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches. Our analysis demonstrates that scientific papers are suitable for data-driven text summarization. Our results could serve as valuable benchmarks for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
