MedFuzz: Exploring the Robustness of Large Language Models in Medical   Question Answering

Robert Osazuwa Ness; Katie Matton; Hayden Helm; Sheng Zhang; Junaid; Bajwa; Carey E. Priebe; Eric Horvitz

arXiv:2406.06573·cs.CL·September 4, 2024·1 cites

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid, Bajwa, Carey E. Priebe, Eric Horvitz

PDF

Open Access

TL;DR

This paper introduces MedFuzz, an adversarial testing method that evaluates the robustness of large language models in medical question answering by challenging their performance under realistic, unpredictable conditions.

Contribution

The paper presents MedFuzz, a novel adversarial approach to assess LLM robustness in medical QA, highlighting potential vulnerabilities not evident in standard benchmarks.

Findings

01

MedFuzz successfully confounds LLMs with realistic question modifications.

02

Benchmark performance drops significantly under MedFuzz attacks.

03

The permutation test confirms the statistical significance of the attacks.

Abstract

Large language models (LLM) have achieved impressive performance on medical question-answering benchmarks. However, high benchmark accuracy does not imply that the performance generalizes to real-world clinical settings. Medical question-answering benchmarks rely on assumptions consistent with quantifying LLM performance but that may not hold in the open world of the clinic. Yet LLMs learn broad knowledge that can help the LLM generalize to practical conditions regardless of unrealistic assumptions in celebrated benchmarks. We seek to quantify how well LLM medical question-answering benchmark performance generalizes when benchmark assumptions are violated. Specifically, we present an adversarial method that we call MedFuzz (for medical fuzzing). MedFuzz attempts to modify benchmark questions in ways aimed at confounding the LLM. We demonstrate the approach by targeting strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling