Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification
Simon Baur, Wojciech Samek, Jackie Ma

TL;DR
This paper benchmarks various uncertainty quantification methods in multi-label chest X-ray classification, analyzing their effectiveness and ability to disentangle epistemic and aleatoric uncertainties in real medical data.
Contribution
It provides the first extensive benchmark of 13 uncertainty methods on real medical imaging data, including extensions of existing methods to multi-label tasks.
Findings
Certain methods outperform others in uncertainty estimation accuracy.
Architecture impacts the effectiveness of uncertainty quantification.
Disentangling epistemic and aleatoric uncertainties varies by method and architecture.
Abstract
Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric uncertainties using an information-theoretical approach in synthetic or well defined data settings like natural image classification, its applicability to real life medical diagnosis tasks remains underexplored. In this study, we provide an extensive uncertainty quantification benchmark for multi-label chest X-ray classification using the MIMIC-CXR-JPG dataset. We evaluate 13 uncertainty quantification methods for convolutional (ResNet) and transformer-based (Vision Transformer) architectures across a wide range of tasks. Additionally, we extend Evidential Deep Learning, HetClass NNs, and Deep Deterministic Uncertainty to the multi-label setting. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
