MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain
Chao Jiang, Wei Xu

TL;DR
This paper introduces MedReadMe, a comprehensive dataset and analysis framework for assessing and improving sentence-level readability in medical texts, leveraging fine-grained annotations and large language models.
Contribution
It provides a new dataset with detailed annotations, benchmarks existing readability metrics, and demonstrates how adding jargon span features enhances their accuracy.
Findings
Adding jargon span counts improves readability metric correlation with human judgments.
Fine-grained span annotations enable better understanding of medical text complexity.
Benchmarking shows LLM-based methods outperform traditional metrics in medical readability.
Abstract
Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. In this paper, we present a systematic study on fine-grained readability measurements in the medical domain at both sentence-level and span-level. We introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel "Google-Easy" and "Google-Hard" categories. It supports our quantitative analysis, which covers 650 linguistic features and automatic complex word and jargon identification. Enabled by our high-quality annotation, we benchmark and improve several state-of-the-art sentence-level readability metrics for the medical domain specifically, which include unsupervised, supervised, and prompting-based methods using recently developed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Topic Modeling
