ChatGPT and post-test probability
Samuel J. Weisenthal

TL;DR
This paper evaluates ChatGPT's ability to perform probabilistic medical diagnostic reasoning, specifically Bayesian updating, and explores how prompt engineering can improve its accuracy in this domain.
Contribution
It systematically probes ChatGPT's performance in Bayesian medical diagnosis tasks and demonstrates how prompt engineering can mitigate errors.
Findings
ChatGPT makes errors when using medical variable names.
Prompt engineering can partially reduce these errors.
Results inform future research on LLMs in healthcare.
Abstract
Reinforcement learning-based large language models, such as ChatGPT, are believed to have potential to aid human experts in many domains, including healthcare. There is, however, little work on ChatGPT's ability to perform a key task in healthcare: formal, probabilistic medical diagnostic reasoning. This type of reasoning is used, for example, to update a pre-test probability to a post-test probability. In this work, we probe ChatGPT's ability to perform this task. In particular, we ask ChatGPT to give examples of how to use Bayes rule for medical diagnosis. Our prompts range from queries that use terminology from pure probability (e.g., requests for a posterior of A given B and C) to queries that use terminology from medical diagnosis (e.g., requests for a posterior probability of Covid given a test result and cough). We show how the introduction of medical variable names leads to an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Text Readability and Simplification
