# Guidelines vs. generative AI in CKD patient education: the role of prompt engineering and expert blinded evaluation

**Authors:** Lutfullah Zahit Koc, Sevgi Gulsen Koc, Ayca Inci, Osman Cagin Buldukoglu, Gokhan Koker, Edgar V. Lerma

PMC · DOI: 10.1186/s12882-026-04814-3 · 2026-02-20

## TL;DR

This study shows that AI models, especially when using structured prompts, can create better CKD patient education content than guidelines, with improved clarity and accessibility.

## Contribution

The study introduces the effectiveness of prompt engineering in improving AI-generated CKD education content for better readability and accuracy.

## Key findings

- AI models outperformed guideline responses in all CLEAR Tool domains, with ChatGPT-4o mini scoring highest.
- Structured prompts significantly improved AI readability, reducing literacy requirements to around 7th-grade level.
- Prompt engineering can enhance AI's usability for populations with limited health literacy.

## Abstract

This study aimed to evaluate the accuracy, content quality, and readability of patient education responses related to chronic kidney disease (CKD) generated by large language models (ChatGPT-4o mini and Gemini) compared to guideline group. Fifteen frequently asked CKD-related questions were selected using global Google Trends data and posed to both AI models and guideline-based sources. Responses were anonymized and evaluated by four independent nephrology professors using the CLEAR Tool, assessing completeness, appropriateness, evidence basis, and clarity. Both AI models significantly outperformed guideline responses across all CLEAR Tool domains (p < 0.001), with ChatGPT-4o mini achieving the highest median score (21.0 [IQR: 5.0] vs. Gemini: 17.0 [IQR: 5.0], Guideline: 13.0 [IQR: 2.0]). Initial readability analysis showed that guideline responses were easier to comprehend (Flesch-Kincaid Grade Level (FKGL): 9.40; Flesch Reading Ease (FRE): 52.01) than AI-generated content (ChatGPT FKGL: 11.34, FRE: 36.17; Gemini FKGL: 9.62, FRE: 46.36). However, when a structured prompt was applied, AI responses demonstrated significant improvements in readability, reducing the required literacy level to approximately the 7th-grade (ChatGPT FKGL: 7.87, FRE: 64.23; Gemini FKGL: 7.13, FRE: 61.45). These findings highlight the potential of prompt-guided AI models to generate accurate, accessible educational content for CKD. Prompt engineering emerges as a practical tool to enhance clarity and usability, particularly for populations with limited health literacy. Integration with frameworks like Retrieval-Augmented Generation may further improve reliability and safety in digital health communication.

The online version contains supplementary material available at 10.1186/s12882-026-04814-3.

## Linked entities

- **Diseases:** chronic kidney disease (MONDO:0005300)

## Full-text entities

- **Diseases:** CKD (MESH:D012080)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13032336/full.md

---
Source: https://tomesphere.com/paper/PMC13032336