Evaluating the Impact of Medical Image Reconstruction on Downstream AI Fairness and Performance
Matteo Wohlrapp, Niklas Bubeck, Daniel Rueckert, William Lotter

TL;DR
This study presents a comprehensive evaluation framework for medical image reconstruction models, revealing their limited impact on diagnostic accuracy but potential to influence fairness, highlighting the need for holistic assessment.
Contribution
The paper introduces a scalable evaluation framework that assesses downstream diagnostic performance and fairness of various reconstruction models across multiple tasks and data types.
Findings
Reconstruction quality metrics poorly predict diagnostic task performance.
Diagnostic accuracy remains stable despite declining PSNR with increased noise.
Reconstruction can modestly affect demographic biases, sometimes amplifying them.
Abstract
AI-based image reconstruction models are increasingly deployed in clinical workflows to improve image quality from noisy data, such as low-dose X-rays or accelerated MRI scans. However, these models are typically evaluated using pixel-level metrics like PSNR, leaving their impact on downstream diagnostic performance and fairness unclear. We introduce a scalable evaluation framework that applies reconstruction and diagnostic AI models in tandem, which we apply to two tasks (classification, segmentation), three reconstruction approaches (U-Net, GAN, diffusion), and two data types (X-ray, MRI) to assess the potential downstream implications of reconstruction. We find that conventional reconstruction metrics poorly track task performance, where diagnostic accuracy remains largely stable even as reconstruction PSNR declines with increasing image noise. Fairness metrics exhibit greater…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
