Towards Accurate Differential Diagnosis with Large Language Models

Daniel McDuff; Mike Schaekermann; Tao Tu; Anil Palepu; Amy; Wang; Jake Garrison; Karan Singhal; Yash Sharma; Shekoofeh Azizi; and Kavita Kulkarni; Le Hou; Yong Cheng; Yun Liu; S Sara Mahdavi; and Sushant Prakash; Anupam Pathak; Christopher Semturs; Shwetak; Patel; Dale R Webster; Ewa Dominowska; Juraj Gottweis; Joelle; Barral; Katherine Chou; Greg S Corrado; Yossi Matias; Jake; Sunshine; Alan Karthikesalingam; Vivek Natarajan

arXiv:2312.00164·cs.CY·December 4, 2023·62 cites

Towards Accurate Differential Diagnosis with Large Language Models

Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy, Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, and Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, and Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak, Patel, Dale R Webster

PDF

Open Access

TL;DR

This study introduces a specialized large language model for medical differential diagnosis that outperforms unassisted clinicians and improves diagnostic accuracy when used as an aid, demonstrating potential to enhance clinical decision-making.

Contribution

We developed and evaluated a diagnostic reasoning LLM that surpasses unassisted clinicians and improves diagnostic accuracy when used as an assistive tool in challenging cases.

Findings

01

LLM outperformed unassisted clinicians in top-10 accuracy (59.1% vs 33.6%)

02

Clinicians with LLM assistance achieved higher accuracy (51.7%) than those without (36.1%)

03

LLM assistance led to more comprehensive differential lists.

Abstract

An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling