# Assessment of patient information guides generated by LLMs for common cardiological procedures

**Authors:** Suppraja Soundarrajan, Karine Vartanian, Rahul Bhakle, Thanuja Katakam, Kinnera Dhanwada, Karansher Singh Randhawa, Nikhitha Puvvala

PMC · DOI: 10.21542/gcsp.2025.26 · 2025-05-15

## TL;DR

This study compares how well ChatGPT and Google Gemini generate patient guides for common heart procedures, finding both similar in most aspects but Google Gemini's guides are easier to read.

## Contribution

The novel contribution is evaluating AI-generated patient information for cardiology procedures using readability and reliability metrics.

## Key findings

- ChatGPT and Google Gemini produced similar word counts, sentence counts, and reliability scores.
- Google Gemini's responses were significantly easier to read and understand based on ease scores.
- No significant differences were found in grade level, similarity, or reliability between the two AI tools.

## Abstract

Introduction: The use of artificial intelligence (AI) has advanced rapidly in the field of cardiology owing to its ability to process complex data and analyze electrocardiograms, echocardiography, and cardiac testing. AI tools, such as ChatGPT and Google Gemini, can provide evidence-based treatment recommendations using concise language, which can help in the early diagnosis of disease.

Methodology: In this cross-sectional study, patient information brochures for three cardiological procedures (ECG, 2D echocardiography, and exercise stress testing) were generated using ChatGPT and Google Gemini. The total word count, sentence count, average words per sentence, and syllables for words were assessed using the Flesch-Kincaid Calculator. The similarity of the text was determined using the Quill Bot plagiarism tool. The reliability of the generated responses was analyzed and graded using the Modified DISCERN Score, which is a 5-point rating system that uses a set of uniform standards to assess the accuracy and dependability of consumer health-related data. Statistical analysis was performed using RStudio v4.3.2. Additionally, the simplicity and reliability scores were compared using Pearson’s Coefficient of Correlation. The unpaired t-test was used to compare the responses.

Results: Responses generated by ChatGPT and Google Gemini were observed to have no significant difference in the word count (P = 0.59), sentence count (P = 0.74), average word per sentence (P = 0.79), grade level (P = 0.06), similarity (P = 0.45), and reliability scores (P = 0.38) between ChatGPT and Google Gemini. However, the ease score was significantly better for Google Gemini-generated responses than for ChatGPT (P = 0.0044), indicating that the responses generated by Google Gemini are more easily readable and understandable.

Conclusions: The study found a statistically significant difference between the average syllables per word and ease score. No significant differences were observed in the number of words, sentences, average words per sentence, grade level, similarity, or reliability scores. More AI technologies need to be evaluated in future studies, which should cover a wider range of illnesses.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13025549/full.md

---
Source: https://tomesphere.com/paper/PMC13025549