Connecting population-level AUC and latent scale-invariant $R^2$ via Semiparametric Gaussian Copula and rank correlations
Debangan Dey, Vadim Zipunnikov

TL;DR
This paper introduces a new scale-invariant R^2 measure linked to AUC using a semiparametric Gaussian copula, enabling consistent evaluation of classification accuracy and explained variation in binary outcomes.
Contribution
It develops a latent R^2 measure connected to AUC via Gaussian copula, with estimation methods using rank correlations and applications to survey data.
Findings
Established the relationship between AUC and latent R^2 via a monotone function.
Proposed Quadrant rank correlation as a robust semiparametric AUC estimator.
Demonstrated consistent AUC estimation in complex survey designs.
Abstract
Area Under the Curve (AUC) is arguably the most popular measure of classification accuracy. We use a semiparametric framework to introduce a latent scale-invariant , a novel measure of variation explained for an observed binary outcome and an observed continuous predictor, and then directly link the latent to AUC. This enables a mutually consistent simultaneous use of AUC as a measure of classification accuracy and the latent as a scale-invariant measure of explained variation. Specifically, we employ Semiparametric Gaussian Copula (SGC) to model a joint dependence between observed binary outcome and observed continuous predictor via the correlation of latent standard normal random variables. Under SGC, we show how, both population-level AUC and latent scale-invariant , defined as a squared latent correlation, can be estimated using any of the four rank statistics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
