EQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triage
Richard J. Young, Alice M. Matthews

TL;DR
EQUITRIAGE conducts a comprehensive fairness audit of LLM-based emergency triage models, revealing gender biases, their underlying mechanisms, and the impact of interventions, emphasizing the need for model-specific bias mitigation before clinical use.
Contribution
This study introduces EQUITRIAGE, a systematic framework for auditing gender bias in LLM-based triage models, highlighting distinct fairness properties and intervention effects.
Findings
All models showed flip rates above 5%, indicating bias.
Some models exhibited female undertriage bias, others were near parity.
Demographic blinding significantly reduced gender flip rates.
Abstract
Emergency department triage assigns patients an acuity score that determines treatment priority, and clinical evidence documents persistent gender disparities in human acuity assessment. As hospitals pilot large language models (LLMs) as triage decision support, a critical question is whether these models reproduce or mitigate known biases. We present EQUITRIAGE, a fairness audit of LLM-based ESI assignment evaluating five models (Gemini-3-Flash, Nemotron-3-Super, DeepSeek-V3.1, Mistral-Small-3.2, GPT-4.1-Nano) across 374,275 evaluations on 18,714 MIMIC-IV-ED vignettes under four prompt strategies. Of 9,368 originals, 9,346 are paired with a gender-swapped counterfactual. All five models produced flip rates above a pre-registered 5% threshold (9.9% to 43.8%). Two showed directional female undertriage (DeepSeek F/M 2.15:1, Gemini 1.34:1); two were near-parity; one had high sensitivity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
