# Patient Education in Bariatric Surgery: Can Artificial Intelligence–Based Chatbots Bridge the Knowledge Gap?

**Authors:** Amirreza Izadi, Hesam Mosavari, Ali Hosseininasab, Ali Jaliliyan, Arzhang Jafari, Mohammadhosein Akhlaghpasand, Aghil Rostami, Maziar Moradi-Lakeh, Foolad Eghbali

PMC · DOI: 10.1155/jobe/2376530 · 2026-02-12

## TL;DR

AI chatbots can provide accurate bariatric surgery information but may be hard to read and need to be used with caution alongside professional advice.

## Contribution

This study evaluates AI chatbots' ability to answer bariatric surgery questions and compares their performance to medical experts.

## Key findings

- AI chatbots outperformed medical experts in accuracy and comprehensiveness of answers.
- ChatGPT-4 performed best among chatbots, while Llama performed worst.
- Chatbot responses were often too complex for general readers and varied in reliability.

## Abstract

The global obesity epidemic challenges health systems, driving people to seek metabolic and bariatric surgery (MBS), especially laparoscopic sleeve gastrectomy (LSG). Many MBS centers have limited resources for patient education, creating knowledge gaps that lead patients to search online. AI chatbots, such as ChatGPT, can provide reliable medical information, though concerns about accuracy and completeness remain.

The study involved four fellowship‐trained minimally invasive surgeons (MISs), nine fellows (MIFs), and two general practitioners (GPs) in the MBS multidisciplinary team from March 1, 2024, to March 30, 2024. Seven AI chatbots were selected, including ChatGPT 3.5 and 4, Bard, Bing, Claude, Llama, and Perplexity, based on their public availability on December 1, 2023. Forty patient questions regarding LSG were sourced from social media, MBS organizations, and online forums. Experts and chatbots answered these questions, with their responses evaluated for accuracy and comprehensiveness on a 5‐point scale. Statistical analyses compared groups’ performance.

Chatbots demonstrated a higher overall performance score (2.55 ± 0.95) compared to the expert group (1.92 ± 1.32, p < 0.001). Among chatbots, ChatGPT‐4 achieved the highest performance (2.94 ± 0.24), while Llama had the lowest (2.15 ± 1.23). Expert group scores were highest for MISs (2.36 ± 1.09), followed by GPs (1.90 ± 1.36) and MIFs (1.75 ± 1.36). The readability of chatbot responses was assessed using Flesch–Kincaid scores, revealing that most responses required reading levels between the 11th grade and college level. Furthermore, chatbots exhibited fair reliability and reproducibility in response consistency, with ChatGPT‐4 showing the highest test–retest reliability.

AI chatbots generated accurate and comprehensive answers to common bariatric patient questions, suggesting promise as a scalable aid for patient education. However, readability often exceeds recommended levels, performance varies by model, occasional inaccuracies occur, and medicolegal considerations remain unresolved. Accordingly, chatbots should complement clinician counseling, and future work should improve readability and reliability and evaluate real‐world safety and impact.

## Linked entities

- **Diseases:** obesity (MONDO:0011122)

## Full-text entities

- **Diseases:** obesity (MESH:D009765)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12902178/full.md

---
Source: https://tomesphere.com/paper/PMC12902178