Paragraph-level Simplification of Medical Texts

Ashwin Devaraj; Iain J. Marshall; Byron C. Wallace; Junyi Jessy Li

arXiv:2104.05767·cs.CL·April 14, 2021

Paragraph-level Simplification of Medical Texts

Ashwin Devaraj, Iain J. Marshall, Byron C. Wallace, Junyi Jessy Li

PDF

1 Repo

TL;DR

This paper introduces a new dataset and methods for simplifying complex medical texts at the paragraph level, improving accessibility for lay audiences and advancing automated biomedical text simplification.

Contribution

It provides a large parallel corpus of technical and lay summaries, a novel metric based on language models, and enhanced Transformer models with jargon penalization for better readability.

Findings

01

The new corpus enables training and evaluation of simplification models.

02

The proposed metric better distinguishes technical from lay summaries.

03

Jargon penalization improves the readability of simplified texts.

Abstract

We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AshOlogn/Paragraph-level-Simplification-of-Medical-Texts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Adam · Dropout · Layer Normalization