mlscorecheck: Testing the consistency of reported performance scores and   experiments in machine learning

Gy\"orgy Kov\'acs; Attila Fazekas

arXiv:2311.07541·cs.LG·November 14, 2023·1 cites

mlscorecheck: Testing the consistency of reported performance scores and experiments in machine learning

Gy\"orgy Kov\'acs, Attila Fazekas

PDF

Open Access 1 Repo

TL;DR

This paper introduces mlscorecheck, an open-source tool that uses numerical techniques to verify the consistency of reported machine learning performance scores and experimental setups, addressing reproducibility issues.

Contribution

The paper presents novel numerical methods and an open-source package for systematically validating reported machine learning results against experimental details.

Findings

01

Effective detection of inconsistencies in reported scores

02

Identification of common flaws in specific fields like retina imaging

03

Facilitation of reproducibility and validation in ML research

Abstract

Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

falsenegativelab/mlscorecheck
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Retinal Imaging and Analysis