Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
David M. W. Powers

TL;DR
This paper critiques common evaluation metrics like precision and recall, proposing informedness, markedness, and correlation as more unbiased measures that better reflect true predictive performance, especially in multi-class scenarios.
Contribution
It introduces and discusses informedness and markedness as unbiased evaluation measures, connecting them with correlation and significance, and extends these concepts from binary to multi-class classification.
Findings
Informedness and markedness provide less biased performance estimates.
Connections between evaluation measures and statistical significance are established.
Extension of measures to multi-class classification is outlined.
Abstract
Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Software Engineering Research · Advanced Text Analysis Techniques
