What Does It Mean for a Medical AI System to Be Right?
Antony Gitau

TL;DR
This paper explores the complex meaning of correctness in medical AI, emphasizing multi-dimensional aspects like data quality, explainability, clinical relevance, and accountability in diagnostic contexts.
Contribution
It introduces a multi-dimensional framework for understanding correctness in medical AI, grounded in philosophy and ethics, beyond standard benchmark performance.
Findings
Correctness involves data, explainability, metrics, and accountability.
Ground truth labels are often unstable and subjective.
Standard clinical metrics may be inadequate for assessing AI correctness.
Abstract
This paper examines what it means for a medical AI system to be right by grounding the question in a specific clinical context: the automatic classification of plasma cells in digitized bone marrow smears for the diagnosis of multiple myeloma. Drawing on philosophy of science and research ethics, the paper argues that correctness in medical AI is not a singular property reducible to benchmark performance, but a multi-dimensional concept involving the availability of expertly labeled medical datasets, the explainability and interpretability of model outputs, the clinical meaningfulness of evaluation metrics, and the distribution of accountability in human-AI workflows. As such, the paper develops this argument through four interrelated themes: the instability of ground truth labels, the opacity of overconfident AI, the inadequacy of standard clinical metrics, and the risk of automation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
