Multimodal Fusion of Histopathology Images and Electronic Health Records for Early Breast Cancer Diagnosis
Aditya Shribhagwan Khandelwal, Mohammad Samar Ansari, Asra Aslam

TL;DR
This study develops a multimodal framework combining histopathology images and electronic health records, significantly improving early breast cancer diagnosis accuracy and interpretability over unimodal models.
Contribution
It introduces an integrated multimodal approach that outperforms individual models, demonstrating the value of combining image and clinical data for breast cancer diagnosis.
Findings
ResNet-18 achieves near-perfect accuracy and AUC on image classification.
XGBoost attains 98% accuracy on EHR prediction.
Fusion model achieves a macro-AUC of 0.997, surpassing unimodal baselines.
Abstract
Breast cancer is a leading cause of cancer-related mortality worldwide, and timely accurate diagnosis is critical to improving survival outcomes. While convolutional neural networks (CNNs) have demonstrated strong performance on histopathology image classification, and machine learning models on structured electronic health records (EHR) have shown utility for clinical risk stratification, most existing work treats these modalities in isolation. This paper presents a systematic multimodal framework that integrates patch-level histopathology features from the BreCaHAD dataset with structured clinical data from MIMIC-IV. We train and evaluate unimodal image models (a simple CNN baseline and ResNet-18 with transfer learning), unimodal tabular models (XGBoost and a multilayer perceptron), and an intermediate-fusion model that concatenates latent representations from both modalities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
