Metric Hub: A metric library and practical selection workflow for use-case-driven data quality assessment in medical AI

Katinka Becker; Maximilian P. Oppelt; Tobias S. Zech; Martin Seyferth; Sandie Cabon; Vanja Miskovic; Ivan Cimrak; Michal Kozubek; Giuseppe D'Avenio; Ilaria Campioni; Jana Fehr; Kanjar De; Ismail Mahmoudi; Emilio Dolgener Cantu; Laurenz Ottmann; Andreas Kla{\ss}; Galaad Altares; Jackie Ma; Alireza Salehi M.; Nadine R. Lang-Richter; Tobias Schaeffter; Daniel Schwabe

arXiv:2601.22702·cs.LG·February 2, 2026

Metric Hub: A metric library and practical selection workflow for use-case-driven data quality assessment in medical AI

Katinka Becker, Maximilian P. Oppelt, Tobias S. Zech, Martin Seyferth, Sandie Cabon, Vanja Miskovic, Ivan Cimrak, Michal Kozubek, Giuseppe D'Avenio, Ilaria Campioni, Jana Fehr, Kanjar De, Ismail Mahmoudi, Emilio Dolgener Cantu, Laurenz Ottmann, Andreas Kla{\ss}, Galaad Altares

PDF

Open Access

TL;DR

This paper introduces Metric Hub, a comprehensive library of data quality metrics and a workflow for selecting appropriate metrics to evaluate data suitability in medical AI, aiming to enhance trustworthiness and regulatory compliance.

Contribution

It operationalizes the METRIC-framework by providing a metric library with detailed metric cards and decision strategies for use-case-driven data quality assessment in medical AI.

Findings

01

Demonstrated on PTB-XL ECG dataset

02

Supports fit-for-purpose data evaluation

03

Facilitates trustworthy AI development

Abstract

Machine learning (ML) in medicine has transitioned from research to concrete applications aimed at supporting several medical purposes like therapy selection, monitoring and treatment. Acceptance and effective adoption by clinicians and patients, as well as regulatory approval, require evidence of trustworthiness. A major factor for the development of trustworthy AI is the quantification of data quality for AI model training and testing. We have recently proposed the METRIC-framework for systematically evaluating the suitability (fit-for-purpose) of data for medical ML for a given task. Here, we operationalize this theoretical framework by introducing a collection of data quality metrics - the metric library - for practically measuring data quality dimensions. For each metric, we provide a metric card with the most important information, including definition, applicability, examples,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Data Quality and Management