# Explainable multi-modal deep learning for transparent cancer diagnosis: integrating radiology, clinical features, and decision visualization

**Authors:** Sital Dash, Laxmi Bewoor, Yashwant Dongre, Amol Bhosle, Kailas Patil, Shrikant Jadhav, Banani Mohapatra, Bhavnish Walia

PMC · DOI: 10.3389/frai.2026.1767612 · Frontiers in Artificial Intelligence · 2026-02-23

## TL;DR

This paper introduces a transparent AI framework that combines medical imaging and clinical data to improve cancer diagnosis accuracy and trust.

## Contribution

The novel framework integrates multi-modal data with attention-based fusion and provides cross-modal explanations using Grad-CAM++ and SHAP.

## Key findings

- The model outperformed uni-modal approaches and simple fusion baselines in diagnostic performance.
- Attention-based fusion showed better results than feature concatenation without sacrificing accuracy.
- Explanations highlighted relevant tumor regions and clinical risk factors, showing strong generalization across datasets.

## Abstract

Although artificial intelligence–based cancer diagnostic models have demonstrated strong predictive performance, their lack of transparency and reliance on single-modality data continue to limit clinical trust and adoption. Effectively integrating multi-modal data with interpret-able decision-making remains a key challenge.

We propose an explainable multi-modal deep learning framework that integrates radiological imaging and structured clinical features using attention-based fusion. Image-level explanations are generated using Grad-CAM++, while SHAP is employed to quantify clinical feature contributions, enabling unified and cross-modal aligned interpretation rather than independent uni-modal explanations. The framework was evaluated on publicly available datasets, including CBIS-DDSM mammography, Duke Breast Cancer MRI, and TCGA cohorts (BRCA, LUAD, and GBM), comprising a total of 3,842 images from 2,917 patients.

The proposed model consistently outperformed uni-modal approaches and simple fusion baselines, achieving an improved balance between sensitivity and specificity. Attention-based fusion demonstrated superior performance compared with feature concatenation, and the integration of explainability did not compromise predictive accuracy. Visual and clinical explanations highlighted diagnostically relevant tumor regions and established oncological risk factors. Stable performance across datasets indicates strong generalization capability.

These results demonstrate that explainable multi-modal learning can effectively combine accuracy, interpret-ability, and robustness, supporting the development of reliable AI-based decision-support systems for cancer diagnosis.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** CALM3 (calmodulin 3) [NCBI Gene 808] {aka CALM, CAM1, CAM2, CAMB, CPVT6, CaM}, BRCA1 (BRCA1 DNA repair associated) [NCBI Gene 672] {aka BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** Breast (MESH:D061325), BRCA (MESH:D001943), Cancer (MESH:D009369), GBM (MESH:D005910), XAI (MESH:C538243), DL (MESH:D007859), AI (MESH:C538142), lesions (MESH:D009059)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12970390/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12970390/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/PMC12970390/full.md

---
Source: https://tomesphere.com/paper/PMC12970390