Why Aggregate Accuracy is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems
Khalid Adnan Alsayed

TL;DR
This paper critiques the reliance on aggregate accuracy for evaluating fairness in law enforcement facial recognition, emphasizing the need for subgroup-level analysis to prevent societal harm.
Contribution
It demonstrates how aggregate accuracy can mask demographic disparities and advocates for fairness-aware evaluation methods in high-stakes facial recognition systems.
Findings
Aggregate accuracy can hide subgroup disparities in error rates.
Systems with similar accuracy can have vastly different fairness profiles.
Fairness-aware evaluation is crucial for responsible deployment.
Abstract
Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithmic decisions can carry significant societal consequences. Despite high reported accuracy, growing evidence demonstrates that such systems often exhibit uneven performance across demographic groups, leading to disproportionate error rates and potential harm. This paper argues that aggregate accuracy is an insufficient metric for evaluating the fairness and reliability of facial recognition systems in high-stakes environments. Through analysis of subgroup-level error distribution, including false positive rate (FPR) and false negative rate (FNR), the paper demonstrates how aggregate performance metrics can obscure critical disparities across demographic groups. Empirical observations show that systems with similar overall accuracy can exhibit substantially different fairness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
