meval: A Statistical Toolbox for Fine-Grained Model Performance Analysis
Dishantkumar Sutariya, Eike Petersen

TL;DR
Meval is a statistical toolbox designed to rigorously analyze machine learning model performance across subgroups, addressing challenges like metric selection, uncertainty estimation, multiple comparisons, and intersectional subgroup discovery, especially in medical imaging.
Contribution
The paper introduces a comprehensive statistical toolbox tailored for subgroup performance analysis in machine learning, with a focus on medical imaging applications.
Findings
Effective identification of subgroup performance disparities.
Application to skin lesion and chest X-ray datasets.
Rigorous statistical assessment of model performance differences.
Abstract
Analyzing machine learning model performance stratified by patient and recording properties is becoming the accepted norm and often yields crucial insights about important model failure modes. Performing such analyses in a statistically rigorous manner is non-trivial, however. Appropriate performance metrics must be selected that allow for valid comparisons between groups of different sample sizes and base rates; metric uncertainty must be determined and multiple comparisons be corrected for, in order to assess whether any observed differences may be purely due to chance; and in the case of intersectional analyses, mechanisms must be implemented to find the most `interesting' subgroups within combinatorially many subgroup combinations. We here present a statistical toolbox that addresses these challenges and enables practitioners to easily yet rigorously assess their models for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · Single-cell and spatial transcriptomics
