# Efficacy of ChatGPT in personalized glucose-lowering strategy development: a clinician-based comparative study

**Authors:** Yu Wang, Cheng-Lin Zhang, Huijuan Zhao, Chang Wang, Lin Guo, Pengfei Wei, Mingyue Jin, Aiping Li, Qiang Li, Hongyan Pan

PMC · DOI: 10.3389/fendo.2026.1693381 · Frontiers in Endocrinology · 2026-03-11

## TL;DR

This study compares ChatGPT-4o's ability to create diabetes treatment plans with human doctors, finding it performs well on simple cases but needs improvement for complex ones.

## Contribution

First comparative study of ChatGPT-4o's glucose-lowering strategy performance against physicians in real-world diabetes cases.

## Key findings

- ChatGPT-4o passed China's endocrinology qualification exam with scores above 60%.
- Its strategies scored similarly to general practitioners but lower than specialists for complex cases.
- Performance dropped significantly with increasing case complexity.

## Abstract

The increasing incidence of diabetes poses a significant burden on healthcare systems. Limited research exists on tools to assist providers in developing personalized glucose-lowering strategies, which could alleviate this pressure and enhance patient outcomes.

This study aims to evaluate the capability of ChatGPT-4o in developing personalized glucose-lowering strategies for individuals with diabetes.

First, an evaluation of ChatGPT-4o’s performance on China’s qualification examination for attending physicians in endocrinology. Second, a cross-sectional study was conducted, involving the comparison of glucose-lowering strategies formulated by ChatGPT-4o, general practitioners (GPs), and attending physicians (APs) in endocrinology for a set of 30 real-world diabetes cases. Three clinical experts scored blindly the reasonableness of each strategy on a scale, with stratification of cases into three complexity levels (A, B, and C) and evaluation of mean scores for each level.

ChatGPT-4o successfully passed all sections of the qualification examination with scores above the 60% threshold. In developing glucose-lowering strategies, ChatGPT-4o achieved a mean score comparable to GPs (82.24 ± 9.933 vs 79.83 ± 3.768; p = .317) but lower than APs (82.24 ± 9.933 vs 86.35 ± 4.142; p = .0467). Performance declined with increasing case complexity, with mean scores dropping from 89.90 ± 2.936 for simple cases (A-level) to 76.12 ± 11.93 for complex cases (C-level) (p <.0020).

ChatGPT-4o performs reliably in generating glucose-lowering strategies for simpler diabetes cases, highlighting its potential to assist community health workers. However, its accuracy in complex cases, especially concerning medication contraindications, requires improvement.

## Linked entities

- **Diseases:** diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** diabetes (MESH:D003920)
- **Chemicals:** glucose (MESH:D005947), ChatGPT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13012919/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13012919/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC13012919/full.md

---
Source: https://tomesphere.com/paper/PMC13012919