Interpretation of the Area Under the ROC Curve for Risk Prediction Models
Ralph H. Stern

TL;DR
This paper clarifies how the ROC curve AUC for risk prediction models depends on population risk distribution and suggests interpreting it as a measure of dispersion rather than discrimination.
Contribution
It provides a mathematical formula linking ROC AUC to population risk distribution and highlights the importance of risk dispersion in model evaluation.
Findings
ROC AUC depends on mean population risk and risk distribution
Analytic formulas for ROC AUC for various risk distributions
Overlap measure provides equivalent information to ROC AUC
Abstract
The area under the curve (AUC) of the receiver operating characteristics curve (ROC) evaluates the separation between patients and nonpatients or discrimination. For risk prediction models these risk distributions can be derived from the population risk distribution so are not independent as in diagnosis. A ROC curve AUC formula based on the underlying population risk distribution clarifies how discrimination is defined mathematically and that generation of the equivalent c-statistic effects a Monte Carlo integration of the formula. For a selection of continuous risk distributions, exact analytic formulas or numerical results for the ROC curve AUC and overlap measure are presented and demonstrate a linear or near-linear dependence on their standard deviation. The ROC curve AUC is also shown to be highly dependent on the mean population risk, a distinction from the independence from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Coding and Health Information · Clinical practice guidelines implementation · Reliability and Agreement in Measurement
