Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature
Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton

TL;DR
This paper introduces two new large and medium-scale datasets for automatic lay summarisation of scientific literature, aiming to improve accessibility and understanding for non-experts.
Contribution
It presents novel biomedical corpora for lay summarisation, characterises their readability and abstractiveness, and benchmarks their effectiveness with summarisation models and expert evaluation.
Findings
Datasets support diverse summarisation needs
Lay summaries vary in readability and abstractiveness
Benchmark results highlight key challenges
Abstract
Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Biomedical Text Mining and Ontologies
