# Artificial Intelligence Chatbots in Peritoneal Dialysis Education: A Cross-Sectional Comparative Study of Quality, Readability, and Reliability

**Authors:** Engin Onan, İlter Bozaci, Yelda Deligoz Bildaci, Sevinc Puren Yucel Karakaya, Ruya Kozanoglu, Rumeyza Kazancioglu

PMC · DOI: 10.3390/jcm15020692 · Journal of Clinical Medicine · 2026-01-15

## TL;DR

This study compares AI chatbots for peritoneal dialysis education, finding significant differences in readability and reliability.

## Contribution

First systematic evaluation of AI chatbots for peritoneal dialysis education using quality and readability metrics.

## Key findings

- Gemini Pro 2.5 outperformed other chatbots in readability and reliability scores.
- LLaMA Maverick 4 had the lowest scores across all evaluated metrics.
- Chatbots showed variability in addressing PD-related questions, highlighting the need for clinical oversight.

## Abstract

Background: Peritoneal dialysis (PD) remains underutilized worldwide, partly due to limited patient education, misconceptions, and barriers to accessing reliable health information. Artificial intelligence (AI)-based chatbots have emerged as promising tools for improving health literacy, supporting shared decision-making, and enhancing patient engagement. However, concerns regarding content quality, reliability, and readability persist, and no study to date has systematically evaluated AI-generated content in the context of PD. Therefore, this study aimed to systematically evaluate the quality, reliability, and readability of AI-generated educational content on peritoneal dialysis using multiple large language model-based chatbots. Methods: A total of 45 frequently asked questions about PD were developed by nephrology experts and categorized into three domains: general information (n = 15), technical and clinical issues (n = 21), and myths/misconceptions (n = 9). Three AI-based chatbots, Gemini Pro 2.5, ChatGPT-5, and LLaMA Maverick 4, were prompted to generate responses to all questions. Each response was independently evaluated by two blinded reviewers for textual characteristics, readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL), and content quality/reliability using the Ensuring Quality Information for Patients (EQIP) tool and the Modified DISCERN instrument. Results: Across all domains, significant differences were observed among the chatbots. Gemini Pro 2.5 achieved higher Flesch Reading Ease (FRES) scores (32.6 ± 10.5) compared with ChatGPT-5 (24.2 ± 11.7) and LLaMA Maverick 4 (16.2 ± 7.5; p < 0.001), as well as higher EQIP scores (75.4% vs. 59.4% and 61.5%, respectively; p < 0.001) and Modified DISCERN scores (4.0 [4.0–4.5] vs. 3.0 [3.0–3.5] and 3.0 [2.5–3.5]; p < 0.001). ChatGPT-5 demonstrated intermediate performance, while LLaMA Maverick 4 showed lower scores across evaluated metrics. Conclusions: These findings demonstrate differences among AI-based chatbots in readability, content quality, and reliability when responding to identical peritoneal dialysis–related questions. While AI chatbots may support health literacy and complement clinical decision-making, their outputs should be interpreted with caution and under appropriate clinical oversight. Future research should focus on multilingual, multicenter, and outcome-based studies to ensure the safe integration of AI into PD patient education.

## Full-text entities

- **Chemicals:** Gemini (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12842389/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12842389/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12842389/full.md

---
Source: https://tomesphere.com/paper/PMC12842389