Evaluating the Feasibility and Accuracy of Large Language Models for   Medical History-Taking in Obstetrics and Gynecology

Dou Liu; Ying Long; Sophia Zuoqiu; Tian Tang; Rong Yin

arXiv:2504.00061·cs.CL·April 2, 2025·3 cites

Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology

Dou Liu, Ying Long, Sophia Zuoqiu, Tian Tang, Rong Yin

PDF

Open Access

TL;DR

This study assesses the potential of large language models, specifically ChatGPT variants, to automate medical history-taking in infertility cases, showing promising accuracy and completeness but highlighting the need for further validation.

Contribution

It introduces an AI-driven conversational system using ChatGPT-4o and ChatGPT-4o-mini for infertility history-taking and compares their performance on real-world cases.

Findings

01

ChatGPT-4o-mini outperforms ChatGPT-4o in information extraction accuracy

02

Both models show strong feasibility in automating infertility history-taking

03

ChatGPT-4o-mini achieves higher completeness in medical histories

Abstract

Effective physician-patient communications in pre-diagnostic environments, and most specifically in complex and sensitive medical areas such as infertility, are critical but consume a lot of time and, therefore, cause clinic workflows to become inefficient. Recent advancements in Large Language Models (LLMs) offer a potential solution for automating conversational medical history-taking and improving diagnostic accuracy. This study evaluates the feasibility and performance of LLMs in those tasks for infertility cases. An AI-driven conversational system was developed to simulate physician-patient interactions with ChatGPT-4o and ChatGPT-4o-mini. A total of 70 real-world infertility cases were processed, generating 420 diagnostic histories. Model performance was assessed using F1 score, Differential Diagnosis (DDs) Accuracy, and Accuracy of Infertility Type Judgment (ITJ). ChatGPT-4o-mini…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · AI in cancer detection · Machine Learning in Healthcare