Toward Revealing Nuanced Biases in Medical LLMs
Farzana Islam Adiba, Rahmatollah Beheshti

TL;DR
This paper introduces a novel framework that combines knowledge graphs and auxiliary LLMs with adversarial techniques to systematically uncover nuanced biases in medical language models, enhancing bias detection and evaluation.
Contribution
The study presents a new approach integrating knowledge graphs, auxiliary LLMs, and adversarial red teaming to reveal complex biases in medical LLMs more effectively than existing methods.
Findings
Framework outperforms existing methods in bias detection.
Scalable approach across multiple datasets and bias types.
Enhanced generation of bias evaluation questions.
Abstract
Large language models (LLMs) used in medical applications are known to be prone to exhibiting biased and unfair patterns. Prior to deploying these in clinical decision-making, it is crucial to identify such bias patterns to enable effective mitigation and minimize negative impacts. In this study, we present a novel framework combining knowledge graphs (KGs) with auxiliary (agentic) LLMs to systematically reveal complex bias patterns in medical LLMs. The proposed approach integrates adversarial perturbation (red teaming) techniques to identify subtle bias patterns and adopts a customized multi-hop characterization of KGs to enhance the systematic evaluation of target LLMs. It aims not only to generate more effective red-teaming questions for bias evaluation but also to utilize those questions more effectively in revealing complex biases. Through a series of comprehensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
