# ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

**Authors:** John Muschelli

arXiv: 1903.04881 · 2020-08-10

## TL;DR

This paper examines how the common practice of interpolating ROC curves for binary predictors can lead to misleading AUC estimates, highlighting the importance of reporting interpolation methods used.

## Contribution

It reveals that linear interpolation of ROC curves for binary predictors can distort AUC values and recommends reporting the interpolation method for accurate interpretation.

## Key findings

- Linear interpolation often used in software can mislead AUC interpretation.
- Comparison of R, Python, Stata, and SAS implementations shows differences.
- Using step function interpolation can provide more accurate AUC estimates.

## Abstract

In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summarized by the area under the curve (AUC), a single number. When a predictor is categorical, the ROC curve has one less than number of categories as potential thresholds; when the predictor is binary there is only one threshold. As the AUC may be used in decision-making processes on determining the best model, it important to discuss how it agrees with the intuition from the ROC curve. We discuss how the interpolation of the curve between thresholds with binary predictors can largely change the AUC. Overall, we show using a linear interpolation from the ROC curve with binary predictors corresponds to the estimated AUC, which is most commonly done in software, which we believe can lead to misleading results. We compare R, Python, Stata, and SAS software implementations. We recommend using reporting the interpolation used and discuss the merit of using the step function interpolator, also referred to as the "pessimistic" approach by Fawcett (2006).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.04881/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1903.04881/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1903.04881/full.md

---
Source: https://tomesphere.com/paper/1903.04881