# Readability Comparison of AI-Generated Versus UpToDate Educational Content on Stroke Management: A Cross-Sectional Study

**Authors:** Saow Renn Ding, Mohammed Ahmed, Tazeen Malik, Rashmitha Somagani, Faizaan Farukh Vohra

PMC · DOI: 10.7759/cureus.98901 · Cureus · 2025-12-10

## TL;DR

This study compares the readability of AI-generated stroke education content from ChatGPT with UpToDate, finding that ChatGPT content is shorter and more concise but less detailed.

## Contribution

The study introduces a formal comparison of linguistic accessibility between AI-generated and peer-reviewed clinical educational content.

## Key findings

- ChatGPT content was shorter, with fewer words and sentences compared to UpToDate.
- UpToDate used more difficult words and had a higher word/sentence ratio.
- Readability scores like FRE, FKGL, and SMOG were not significantly different between the two sources.

## Abstract

Introduction

Stroke is a major cause of global morbidity and mortality. Readability of educational material is critical for rapid clinical decision-making among healthcare professionals. UpToDate (UpToDate, Inc., Waltham, MA) is a widely used, peer-reviewed point-of-care clinical resource, while ChatGPT (OpenAI, San Francisco, CA) is an emerging AI-based educational support tool. However, a formal comparison of their linguistic accessibility has not been performed.

Objective

To compare the readability and linguistic complexity of educational material on stroke generated by ChatGPT (GPT-4o) versus content retrieved from UpToDate, using validated readability metrics.

Design, setting, and participants

This cross-sectional study was conducted between May 27 and June 4, 2025. ChatGPT (GPT-4o, accessed May 27, 2025) was prompted to generate educational content on stroke. A corresponding section from UpToDate (accessed May 27, 2025) was extracted. Only prose content was analyzed. Readability parameters assessed included total word count, sentence count, word/sentence ratio (average words per sentence), Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG) Index, difficult word count, and difficult word percentage. Data were analyzed using IBM SPSS v25 (IBM Corp., Armonk, NY) and R v4.3.2 (R Foundation for Statistical Computing, Vienna, Austria). The Mann-Whitney U test was used. P < 0.05 was considered statistically significant.

Results

UpToDate content was substantially longer (median = 2772 vs. 304 words; p = 0.008) and used more sentences (median = 134 vs. 23; p = 0.032) and difficult words (median = 857 vs. 88; p = 0.008) compared to ChatGPT. The word/sentence ratio (average words per sentence) was also higher (21.7 vs. 13.2; p = 0.008). However, no statistically significant differences were observed for FRE (p = 1.000), FKGL (p = 0.222), SMOG Index (p = 0.151), or difficult word percentage (p = 0.690).

Conclusions

ChatGPT produces shorter and more concise educational content on stroke while maintaining comparable readability to UpToDate. The lower linguistic density may enhance rapid orientation for trainees; however, the reduced depth indicates ChatGPT should supplement, not replace, established peer-reviewed resources. Future research should explore multiple medical topics, additional AI models, and assess the clinical applicability and accuracy of AI-generated content.

## Linked entities

- **Diseases:** stroke (MONDO:0005098)

## Full-text entities

- **Diseases:** Stroke (MESH:D020521)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12787534/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12787534/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12787534/full.md

---
Source: https://tomesphere.com/paper/PMC12787534