Multilingual Simplification of Medical Texts
Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh J., Ramanathan, Wei Xu, Byron C. Wallace, and Junyi Jessy Li

TL;DR
This paper introduces a multilingual dataset for simplifying complex medical texts into four languages and evaluates models on this task, aiming to improve health literacy across language barriers.
Contribution
It presents the first sentence-aligned multilingual medical text simplification dataset and evaluates models in multilingual and zero-shot settings.
Findings
Models can generate viable simplified texts
Multilingual dataset enables cross-language simplification research
Challenges remain in improving model quality and consistency
Abstract
Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques
