A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models
Stephen R. Pfohl, Heather Cole-Lewis, Rory Sayres, Darlene Neal, Mercy, Asiedu, Awa Dieng, Nenad Tomasev, Qazi Mamunur Rashid, Shekoofeh Azizi, Negar, Rostamzadeh, Liam G. McCoy, Leo Anthony Celi, Yun Liu, Mike Schaekermann,, Alanna Walton, Alicia Parrish, Chirag Nagpal

TL;DR
This paper introduces a comprehensive framework and datasets for identifying biases in large language models used in healthcare, aiming to improve health equity by surfacing potential harms in model-generated medical answers.
Contribution
It presents a multifactorial human assessment framework and the EquityMedQA dataset, enabling more effective detection of biases in LLMs for medical applications.
Findings
Our approach uncovers biases missed by narrower evaluations.
Diverse assessment methods and raters improve bias detection.
The methodology highlights the importance of participatory review processes.
Abstract
Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealthcare Systems and Public Health
