Computerized diagnostic decision support systems—Isabel Pro versus ChatGPT-4 part II

Joe M Bridges; Xiaoqian Jiang; Michael Ige; Oluwatoniloba Toyobo

PMC · DOI:10.1093/jamiaopen/ooaf048·June 16, 2025

Computerized diagnostic decision support systems—Isabel Pro versus ChatGPT-4 part II

Joe M Bridges, Xiaoqian Jiang, Michael Ige, Oluwatoniloba Toyobo

PDF

Open Access

TL;DR

This study compares diagnostic accuracy and consistency of ChatGPT-4 and Isabel Pro, finding limitations in ChatGPT-4's reliability for medical diagnosis.

Contribution

The study evaluates ChatGPT-4's diagnostic performance with novel prompting strategies and expert panel sizes, revealing reproducibility and accuracy challenges.

Findings

01

ChatGPT-4 showed improved recall but fewer correct diagnoses compared to Isabel Pro.

02

Reconsidering Isabel Pro's differential improved ChatGPT-4's recall by 11%.

03

Reference citation accuracy was low at 34.8% for citations and 37.8% for DOIs.

Abstract

Does a Tree-of-Thought prompt and reconsideration of Isabel Pro’s differential improve ChatGPT-4’s accuracy; does increasing expert panel size improve ChatGPT-4’s accuracy; does ChatGPT-4 produce consistent outputs in sequential requests; what is the frequency of fabricated references? Isabel Pro, a computerized diagnostic decision support system, and ChatGPT-4, a large language model. Using 201 cases from the New England Journal of Medicine, each system produced a differential diagnosis ranked by likelihood. Statistics were Mean Reciprocal Rank, Recall at Rank, Average Rank, Number of Correct Diagnoses, and Rank Improvement. For reproducibility, the study compared the initial expert panel run to each subsequent run, using the r-squared calculation from a scatter plot of each run. ChatGPT-4 improved MRR and Recall at 10 to 0.72 but produced fewer correct diagnoses and lower average…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

Isabel Pro

Diseases4

hallucination death Infectious Diseases Neoplasms

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClinical Reasoning and Diagnostic Skills · Machine Learning in Healthcare · Sepsis Diagnosis and Treatment