mlscorecheck: Testing the consistency of reported performance scores and experiments in machine learning
Gy\"orgy Kov\'acs, Attila Fazekas

TL;DR
This paper introduces mlscorecheck, an open-source tool that uses numerical techniques to verify the consistency of reported machine learning performance scores and experimental setups, addressing reproducibility issues.
Contribution
The paper presents novel numerical methods and an open-source package for systematically validating reported machine learning results against experimental details.
Findings
Effective detection of inconsistencies in reported scores
Identification of common flaws in specific fields like retina imaging
Facilitation of reproducibility and validation in ML research
Abstract
Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Retinal Imaging and Analysis
