Red Teaming Large Language Models for Healthcare

Vahid Balazadeh; Michael Cooper; David Pellow; Atousa Assadi; Jennifer Bell; Mark Coatsworth; Kaivalya Deshpande; Jim Fackler; Gabriel Funingana; Spencer Gable-Cook; Anirudh Gangadhar; Abhishek Jaiswal; Sumanth Kaja; Christopher Khoury; Amrit Krishnan; Randy Lin; Kaden McKeen; Sara Naimimohasses; Khashayar Namdar; Aviraj Newatia; Allan Pang; Anshul Pattoo; Sameer Peesapati; Diana Prepelita; Bogdana Rakova; Saba Sadatamin; Rafael Schulman; Ajay Shah; Syed Azhar Shah; Syed Ahmar Shah; Babak Taati; Balagopal Unnikrishnan; I\~nigo Urteaga; Stephanie Williams; and Rahul G Krishnan

arXiv:2505.00467·cs.CL·July 14, 2025

Red Teaming Large Language Models for Healthcare

Vahid Balazadeh, Michael Cooper, David Pellow, Atousa Assadi, Jennifer Bell, Mark Coatsworth, Kaivalya Deshpande, Jim Fackler, Gabriel Funingana, Spencer Gable-Cook, Anirudh Gangadhar, Abhishek Jaiswal, Sumanth Kaja, Christopher Khoury, Amrit Krishnan, Randy Lin, Kaden McKeen

PDF

TL;DR

This paper discusses a workshop where experts tested large language models in healthcare for vulnerabilities that could lead to clinical harm, revealing critical weaknesses and assessing their prevalence across models.

Contribution

It introduces a collaborative red-teaming process combining clinical and technical expertise to identify and categorize vulnerabilities in healthcare LLMs.

Findings

01

Identified multiple vulnerabilities that could cause clinical harm

02

Categorized types of vulnerabilities found in LLMs

03

Assessed vulnerability prevalence across different LLMs

Abstract

We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large language model (LLM) outputs a response that could cause clinical harm. Red-teaming with clinicians enables the identification of LLM vulnerabilities that may not be recognised by LLM developers lacking clinical expertise. We report the vulnerabilities found, categorise them, and present the results of a replication study assessing the vulnerabilities across all LLMs provided.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.