An Empirical Analysis of Calibration and Selective Prediction in Multimodal Clinical Condition Classification

L. Juli\'an Lechuga L\'opez; Farah E. Shamout; Tim G. J. Rudner

arXiv:2603.02719·cs.LG·May 13, 2026

An Empirical Analysis of Calibration and Selective Prediction in Multimodal Clinical Condition Classification

L. Juli\'an Lechuga L\'opez, Farah E. Shamout, Tim G. J. Rudner

PDF

TL;DR

This study empirically examines the reliability of uncertainty-based selective prediction in multimodal clinical condition classification, revealing calibration issues that impair safety guarantees.

Contribution

It identifies a task-specific failure mode of selective prediction due to class-dependent miscalibration in multimodal clinical AI models.

Findings

01

Selective prediction can degrade performance despite strong metrics.

02

Models exhibit severe class-dependent miscalibration, especially for underrepresented conditions.

03

Standard aggregate metrics may hide these calibration issues.

Abstract

As artificial intelligence systems move toward clinical deployment, ensuring reliable prediction behavior is fundamental for safety-critical decision-making tasks. One proposed safeguard is selective prediction, where models can defer uncertain predictions to human experts for review. In this work, we empirically evaluate the reliability of uncertainty-based selective prediction in multilabel clinical condition classification using multimodal ICU data. Across a range of state-of-the-art unimodal and multimodal models, we find that selective prediction can substantially degrade performance despite strong standard evaluation metrics. This failure is driven by severe class-dependent miscalibration, whereby models assign high uncertainty to correct predictions and low uncertainty to incorrect ones, particularly for underrepresented clinical conditions. Our results show that commonly used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.