# Evaluation of the reliability and risks of ChatGPT-4o in answering pediatric cough questions: A comparative analysis between pediatricians and pediatric pulmonologists

**Authors:** Hanife Tuğçe Çağlar, Emine Özdemir Kaçer, Sevgi Pekcan, Fatma Nur Ayman, Gauri Mankekar, Gauri Mankekar, Gauri Mankekar

PMC · DOI: 10.1371/journal.pone.0340007 · PLOS One · 2025-12-31

## TL;DR

This study evaluates how reliable and safe ChatGPT-4o is for answering questions about pediatric cough, comparing ratings from pediatricians and pulmonologists.

## Contribution

The study provides a comparative analysis of ChatGPT-4o's reliability in pediatric cough management from two medical specialties.

## Key findings

- ChatGPT-4o responses were generally rated as trustworthy and valuable, but with notable differences between pediatricians and pulmonologists.
- For no question did more than 50% of participants indicate a problem with the AI-generated answers.
- Pediatricians found the AI responses more trustworthy and less dangerous than pulmonologists did.

## Abstract

Artificial intelligence tools such as ChatGPT are increasingly used by patients and healthcare professionals, yet their reliability in pediatric respiratory conditions remains unclear. This study aims to assess the trustworthiness, comprehensiveness, value, and potential dangers of ChatGPT-4o-generated responses to frequently asked questions about the management and care of cough in children.

A total of 10 cough-related questions were selected for ChatGPT-4o. The questions and responses generated by ChatGPT-4o are presented to 32 pediatric pulmonologists and 32 pediatricians. An online questionnaire was developed for this study. Participants rated the answers generated by ChatGPT-4o on a scale of 1–10 in terms of trustworthiness, comprehensiveness, value, and danger. Higher scores indicate higher levels of trustworthiness, comprehensiveness, value and danger. In addition, a yes/no question asked participants if there was anything wrong with the answer generated by ChatGPT-4o.

The ChatGPT-4o-generated answers were generally rated by participants as trustworthy (median:6.45, IQR:1.97), valuable (median:6.15, IQR:2.30), comprehensive (median:6.15, IQR:1.83), and not dangerous (median:4.35, IQR:2.65). There was a statistically significant difference in all overall ratings between pulmonologists and pediatricians. Pediatricians rated ChatGPT-4o-generated answers as more trustworthy, valuable, comprehensive, and less dangerous compared to pediatric pulmonologists. For each of the ten questions, at least one participant indicated that there was something wrong with the ChatGPT-4o-generated response. However, for no question did the proportion of “yes” responses exceed 50%, indicating that concerns were not universally shared among participants.

Our study highlights both the potential benefits and limitations of ChatGPT-4o in providing medical information about pediatric cough. While AI-generated responses were generally rated as trustworthy and valuable, differences in assessment between pediatricians and pediatric pulmonologists emphasize the need for careful interpretation of AI-derived medical content.

## Full-text entities

- **Diseases:** cough (MESH:D003371)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12755730/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12755730/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12755730/full.md

---
Source: https://tomesphere.com/paper/PMC12755730