MedPatch: Confidence-Guided Multi-Stage Fusion for Multimodal Clinical Data
Baraa Al Jorf, Farah Shamout

TL;DR
MedPatch is a novel multi-stage fusion architecture that effectively integrates heterogeneous multimodal clinical data using confidence-guided patching, significantly improving prediction accuracy in real-world clinical tasks.
Contribution
Introduces MedPatch, a confidence-guided multi-stage fusion model that handles missing modalities and heterogeneity in multimodal clinical data, achieving state-of-the-art results.
Findings
MedPatch outperforms existing baselines on in-hospital mortality prediction.
It effectively manages missing modalities with a missingness-aware module.
Achieves new state-of-the-art benchmarks on clinical prediction tasks.
Abstract
Clinical decision-making relies on the integration of information across various data modalities, such as clinical time-series, medical images and textual reports. Compared to other domains, real-world medical data is heterogeneous in nature, limited in size, and sparse due to missing modalities. This significantly limits model performance in clinical prediction tasks. Inspired by clinical workflows, we introduce MedPatch, a multi-stage multimodal fusion architecture, which seamlessly integrates multiple modalities via confidence-guided patching. MedPatch comprises three main components: (i) a multi-stage fusion strategy that leverages joint and late fusion simultaneously, (ii) a missingness-aware module that handles sparse samples with missing modalities, (iii) a joint fusion module that clusters latent token patches based on calibrated unimodal token-level confidence. We evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
