Auditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language models

William Guey; Wei Zhang; Pierrick Bougault; Yi Wang; Bertan Ucar; Vitor D. de Moura; Jos\'e O. Gomes

arXiv:2605.01451·cs.CL·May 5, 2026

Auditing demographic bias in AI-based emergency police dispatch: a cross-lingual evaluation of eleven large language models

William Guey, Wei Zhang, Pierrick Bougault, Yi Wang, Bertan Ucar, Vitor D. de Moura, Jos\'e O. Gomes

PDF

TL;DR

This study evaluates demographic bias in large language models used for emergency police dispatch across languages, revealing biases vary by demographic category, language, and scenario ambiguity, with implications for deployment.

Contribution

Introduces a cross-lingual audit framework for assessing demographic bias in LLMs within emergency dispatch, highlighting bias variability and cross-lingual asymmetries.

Findings

01

Bias emerges under ambiguous incident severity but diminishes when call content clarifies priority.

02

Religious appearance has the largest bias effect, followed by gender and race.

03

Gender bias is amplified in Mandarin Chinese, race bias is more pronounced in English.

Abstract

Large language models (LLMs) are rapidly being integrated into high-stakes public safety systems, including emergency call triage and dispatch decision support, yet their demographic fairness in this context remains largely untested. Here we introduce a cross-lingual audit framework that operationalizes the Police Priority Dispatch System as a five-level ordinal classification task and applies a controlled minimal-pair design to isolate the effect of demographic cues. Across 19,800 model outputs spanning 11 frontier models, 15 scenario pairs, three demographic categories (religious appearance, gender, and race), and two languages (English and Mandarin Chinese), we find that demographic bias emerges systematically when incident severity is ambiguous but largely disappears when the operational priority is clearly determined by call content. Bias magnitude varies by demographic axis, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.