Robustness Disparities in Commercial Face Detection
Samuel Dooley, Tom Goldstein, John P. Dickerson

TL;DR
This paper benchmarks the robustness of commercial face detection systems under natural noise, revealing disparities based on age, gender presentation, skin type, and lighting conditions.
Contribution
It provides the first detailed robustness benchmark of Amazon Rekognition, Microsoft Azure, and Google Cloud face detection systems under real-world perturbations.
Findings
Older, masculine, darker skin, and dimly lit faces are more error-prone.
Robustness varies significantly across different demographic groups.
Natural perturbations disproportionately affect certain face types.
Abstract
Facial detection and analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Critiques that focus on system performance analyze disparity of the system's output, i.e., how frequently is a face detected for different Fitzpatrick skin types or perceived genders. However, we focus on the robustness of these system outputs under noisy natural perturbations. We present the first of its kind detailed benchmark of the robustness of three such systems: Amazon Rekognition, Microsoft Azure, and Google Cloud Platform. We use both standard and recently released academic facial datasets to quantitatively analyze trends in robustness for each. Across all the datasets and systems, we generally find that photos of individuals who are older, masculine presenting, of darker skin type, or have dim lighting are more susceptible to errors than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face Recognition and Perception · Ethics and Social Impacts of AI
