Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation

Yue Guo; Jae Ho Sohn; Gondy Leroy; Trevor Cohen

arXiv:2505.10409·cs.CL·May 16, 2025

Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation

Yue Guo, Jae Ho Sohn, Gondy Leroy, Trevor Cohen

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language model-generated plain language summaries in healthcare, revealing that while they appear comparable to human summaries in subjective quality, they do not enhance understanding better, and automated metrics are unreliable.

Contribution

First large-scale evaluation comparing LLM-generated PLSs to human ones using both subjective ratings and comprehension tests, highlighting gaps in current automated evaluation metrics.

Findings

01

LLM-generated PLSs are perceived as similar to human summaries in subjective assessments.

02

Human-written PLSs significantly improve reader comprehension over LLM-generated ones.

03

Automated evaluation metrics do not correlate well with human judgments of PLS quality.

Abstract

Plain language summaries (PLSs) are essential for facilitating effective communication between clinicians and patients by making complex medical information easier for laypeople to understand and act upon. Large language models (LLMs) have recently shown promise in automating PLS generation, but their effectiveness in supporting health information comprehension remains unclear. Prior evaluations have generally relied on automated scores that do not measure understandability directly, or subjective Likert-scale ratings from convenience samples with limited generalizability. To address these gaps, we conducted a large-scale crowdsourced evaluation of LLM-generated PLSs using Amazon Mechanical Turk with 150 participants. We assessed PLS quality through subjective Likert-scale ratings focusing on simplicity, informativeness, coherence, and faithfulness; and objective multiple-choice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling