Testing the Consistency of Performance Scores Reported for Binary Classification Problems
Attila Fazekas, Gy\"orgy Kov\'acs

TL;DR
This paper introduces numerical techniques to verify the consistency of reported performance scores in binary classification, ensuring research integrity without relying on statistical inference.
Contribution
It presents a novel, non-statistical numerical method to detect inconsistencies in reported classification performance metrics.
Findings
Effective detection of inconsistencies in medical classification data
Applicable across various scientific domains
Open-source Python package available
Abstract
Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and rank classification techniques based on performance metrics such as accuracy, sensitivity, and specificity. However, reported performance scores may not always serve as a reliable basis for research ranking. This can be attributed to undisclosed or unconventional practices related to cross-validation, typographical errors, and other factors. In a given experimental setup, with a specific number of positive and negative test items, most performance scores can assume specific, interrelated values. In this paper, we introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup. Importantly, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
