Comparing Artificial Intelligence and Obstetrics Residents in Answering Standardized Patient Questions Regarding Gestational Diabetes
Azam Faraji, Hossein Faramarzi, Mahsa Razeghi, Nasrin Asadi, Homeira Vafaei, Maryam Kasraeian

TL;DR
This study compared AI chatbots and medical residents in answering questions about gestational diabetes, finding that AI models performed better in accuracy and completeness.
Contribution
Demonstrates that AI models outperform residents in answering gestational diabetes questions, suggesting potential for medical education and clinical support.
Findings
AI models had significantly higher accuracy than residents in answering GDM-related questions.
GPT-4o and DeepSeek V3 0324 showed significantly higher completeness scores than residents.
DeepSeek V3 0324 achieved the highest scores for both accuracy and completeness.
Abstract
Introduction This study evaluated the performance of three artificial intelligence (AI) chatbots (GPT-3.5 (OpenAI, San Francisco, USA), GPT-4o (OpenAI, San Francisco, USA), and DeepSeek V3 0324 (DeepSeek AI, Beijing, China)) compared to eight gynecology residents in answering questions related to gestational diabetes mellitus (GDM), aiming to assess and compare the accuracy and completeness of responses to standardized patient questions on gestational diabetes in pregnancy. Methods Twenty-four questions were answered by three chatbots (GPT-3.5, GPT-4o, and DeepSeek V3 0324) and eight residents. Two faculty members independently rated the responses for accuracy and completeness using a 5-point scale. Independent-samples t-tests were used for statistical analysis. Results The mean accuracy scores were 3.64 for residents, 4.67 for GPT-3.5, 4.69 for GPT-4o, and 4.81 for DeepSeek V3…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
