TL;DR
This paper introduces a new dataset and methods for simplifying complex medical texts at the paragraph level, improving accessibility for lay audiences and advancing automated biomedical text simplification.
Contribution
It provides a large parallel corpus of technical and lay summaries, a novel metric based on language models, and enhanced Transformer models with jargon penalization for better readability.
Findings
The new corpus enables training and evaluation of simplification models.
The proposed metric better distinguishes technical from lay summaries.
Jargon penalization improves the readability of simplified texts.
Abstract
We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Adam · Dropout · Layer Normalization
