The Misuse of AUC: What High Impact Risk Assessment Gets Wrong
Kweku Kwegyir-Aggrey, Marissa Gerchick, Malika Mohan, Aaron Horowitz,, Suresh Venkatasubramanian

TL;DR
This paper critiques the widespread misuse of AUC in high-impact risk assessments, highlighting how it leads to invalid model validation and obscures critical decision factors.
Contribution
It clarifies the original purpose of AUC, demonstrates its misapplication in practice, and advocates for more robust validation methods in risk assessment models.
Findings
AUC is often misused beyond its intended purpose.
Misuse of AUC can lead to invalid model comparisons.
Current practices overlook decision thresholds and fairness concerns.
Abstract
When determining which machine learning model best performs some high impact risk assessment task, practitioners commonly use the Area under the Curve (AUC) to defend and validate their model choices. In this paper, we argue that the current use and understanding of AUC as a model performance metric misunderstands the way the metric was intended to be used. To this end, we characterize the misuse of AUC and illustrate how this misuse negatively manifests in the real world across several risk assessment domains. We locate this disconnect in the way the original interpretation of AUC has shifted over time to the point where issues pertaining to decision thresholds, class balance, statistical uncertainty, and protected groups remain unaddressed by AUC-based model comparisons, and where model choices that should be the purview of policymakers are hidden behind the veil of mathematical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis · Explainable Artificial Intelligence (XAI)
