MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid, Bajwa, Carey E. Priebe, Eric Horvitz

TL;DR
This paper introduces MedFuzz, an adversarial testing method that evaluates the robustness of large language models in medical question answering by challenging their performance under realistic, unpredictable conditions.
Contribution
The paper presents MedFuzz, a novel adversarial approach to assess LLM robustness in medical QA, highlighting potential vulnerabilities not evident in standard benchmarks.
Findings
MedFuzz successfully confounds LLMs with realistic question modifications.
Benchmark performance drops significantly under MedFuzz attacks.
The permutation test confirms the statistical significance of the attacks.
Abstract
Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
