# When AI speaks like a specialist: ChatGPT-4 in the management of inflammatory bowel disease

**Authors:** Elena De Cristofaro, Francesca Zorzi, Maria Abreu, Alice Colella, Giovanna Del Vecchio Blanco, Gionata Fiorino, Elisabetta Lolli, Nurulamin Noor, Loris Riccardo Lopetuso, Mathieu Pioche, Jean Grimaldi, Omero Alessandro Paoluzi, Joana Roseira, Giorgia Sena, Edoardo Troncone, Emma Calabrese, Giovanni Monteleone, Irene Marafini

PMC · DOI: 10.3389/frai.2025.1678320 · Frontiers in Artificial Intelligence · 2025-10-10

## TL;DR

ChatGPT-4 provides high-quality, clear, and actionable responses to inflammatory bowel disease questions, often outperforming human experts and showing potential for patient education.

## Contribution

ChatGPT-4's performance in IBD communication is evaluated and shown to outperform human experts in several dimensions.

## Key findings

- ChatGPT-4 responses received higher overall scores than human experts in accuracy, reliability, and actionability.
- AI-generated responses were frequently indistinguishable from those written by physicians.
- Diet-related scenarios received consistently lower scores compared to other themes.

## Abstract

Artificial intelligence (AI) is gaining traction in healthcare, especially for patients’ education. Inflammatory bowel diseases (IBD) require continuous engagement, yet the quality of online information accessed by patients is inconsistent. ChatGPT, a generative AI model, has shown promise in medical scenarios, but its role in IBD communication needs further evaluation. The objective of this study was to assess the quality of ChatGPT-4’s responses to common patient questions about IBD, compared to those provided by experienced IBD specialists.

Twenty-five frequently asked questions were collected during routine IBD outpatient visits and categorized into five themes: pregnancy/breastfeeding, diet, vaccinations, lifestyle, and medical therapy/surgery. Each question was answered by ChatGPT-4 and by two expert gastroenterologists. Responses were anonymized and evaluated by 12 physicians (six IBD experts and six non-experts) using a 5-point Likert scale across four dimensions: accuracy, reliability, comprehensibility, and actionability. Evaluators also attempted to identify whether responses were AI- or human-generated.

ChatGPT-4 responses received significantly higher overall scores than those from human experts (mean 4.28 vs. 4.05; p < 0.001). The best-rated scenarios were medical therapy and surgery; the diet scenario consistently received lower scores. Only 33% of AI-generated responses were correctly identified as such, indicating strong similarity to human-written answers. Both expert and non-expert evaluators rated AI responses highly, though IBD specialists gave higher ratings overall.

ChatGPT-4 generated high-quality, clear, and actionable responses to IBD-related patient questions, often outperforming human experts. Its outputs were frequently indistinguishable from those written by physicians, suggesting potential as a supportive tool for patient education. Nonetheless, further studies are needed to assess real-world application and ensure appropriate use in personalized clinical care.

## Linked entities

- **Diseases:** inflammatory bowel disease (MONDO:0005265)

## Full-text entities

- **Diseases:** IBD (MESH:D015212)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12549657/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12549657/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/PMC12549657/full.md

---
Source: https://tomesphere.com/paper/PMC12549657