Language models are susceptible to incorrect patient self-diagnosis in medical applications
Rojin Ziaei, Samuel Schmidgall

TL;DR
This paper investigates how large language models perform in medical diagnosis tasks when patients provide self-diagnostic reports, revealing significant susceptibility to errors due to biased information.
Contribution
It introduces a novel evaluation framework incorporating patient self-diagnosis into medical exam questions to assess LLM robustness in realistic scenarios.
Findings
LLMs' diagnostic accuracy drops with biased patient input
Models are highly susceptible to incorrect self-diagnosis
Bias validation significantly impacts model performance
Abstract
Large language models (LLMs) are becoming increasingly relevant as a potential tool for healthcare, aiding communication between clinicians, researchers, and patients. However, traditional evaluations of LLMs on medical exam questions do not reflect the complexity of real patient-doctor interactions. An example of this complexity is the introduction of patient self-diagnosis, where a patient attempts to diagnose their own medical conditions from various sources. While the patient sometimes arrives at an accurate conclusion, they more often are led toward misdiagnosis due to the patient's over-emphasis on bias validating information. In this work we present a variety of LLMs with multiple-choice questions from United States medical board exams which are modified to include self-diagnostic reports from patients. Our findings highlight that when a patient proposes incorrect bias-validating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
