CLIMB: A Benchmark of Clinical Bias in Large Language Models

Yubo Zhang; Shudi Hou; Mingyu Derek Ma; Wei Wang; Muhao Chen; Jieyu; Zhao

arXiv:2407.05250·cs.CL·November 18, 2024

CLIMB: A Benchmark of Clinical Bias in Large Language Models

Yubo Zhang, Shudi Hou, Mingyu Derek Ma, Wei Wang, Muhao Chen, Jieyu, Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLIMB, a comprehensive benchmark for evaluating intrinsic and extrinsic clinical bias in large language models, highlighting prevalent biases and emphasizing the need for mitigation in clinical applications.

Contribution

It presents the first systematic benchmark, including a novel metric AssocMAD and counterfactual evaluation methods, to assess clinical bias in LLMs.

Findings

01

Prevalent intrinsic and extrinsic biases found in popular LLMs.

02

AssocMAD effectively measures demographic disparities.

03

Counterfactual interventions reveal bias in clinical diagnosis tasks.

Abstract

Large language models (LLMs) are increasingly applied to clinical decision-making. However, their potential to exhibit bias poses significant risks to clinical equity. Currently, there is a lack of benchmarks that systematically evaluate such clinical bias in LLMs. While in downstream tasks, some biases of LLMs can be avoided such as by instructing the model to answer "I'm not sure...", the internal bias hidden within the model still lacks deep studies. We introduce CLIMB (shorthand for A Benchmark of Clinical Bias in Large Language Models), a pioneering comprehensive benchmark to evaluate both intrinsic (within LLMs) and extrinsic (on downstream tasks) bias in LLMs for clinical decision tasks. Notably, for intrinsic bias, we introduce a novel metric, AssocMAD, to assess the disparities of LLMs across multiple demographic groups. Additionally, we leverage counterfactual intervention to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uscnlp-lime/climb
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Natural Language Processing Techniques

MethodsLLaMA