# MDL-CA: a multimodal deep learning approach with a cross attention mechanism for accurate brain cancer diagnosis

**Authors:** Sumaira Sarwar, Saqib Majeed, Asif Nawaz, Ruqia Bibi, Seung Won Lee

PMC · DOI: 10.3389/fpubh.2025.1687335 · Frontiers in Public Health · 2026-01-05

## TL;DR

This paper introduces MDL-CA, a deep learning framework that combines genomic and MRI data using cross-attention to improve brain cancer diagnosis accuracy and biological understanding.

## Contribution

The novel contribution is a cross-attention multimodal deep learning framework that integrates genomic and MRI data for more accurate and interpretable brain cancer diagnosis.

## Key findings

- MDL-CA achieved diagnostic accuracies of up to 98.46% across four benchmark datasets.
- The cross-attention mechanism improves biological interpretability and diagnostic precision compared to single-modality approaches.
- The model demonstrates robustness and generalization across diverse brain cancer datasets.

## Abstract

Brain cancer diagnosis poses a significant clinical challenge due to the complex interplay between molecular mechanisms and anatomical abnormalities. Traditional diagnostic techniques, including invasive biopsies, isolated genomic assays, and standalone Magnetic Resonance Imaging (MRI), often exhibit limitations such as procedural risks, inadequate sensitivity, and incomplete assessment of tumor heterogeneity. These shortcomings contribute to delayed diagnosis, inaccurate tumor grading, and suboptimal treatment planning. Furthermore, single-modality data, whether MRI or genomic profiles, frequently yield limited diagnostic accuracy and biological interpretability.

To address these limitations, this study proposes MDL-CA, a Multimodal Deep Learning framework with a Cross-Attention mechanism, designed to integrate genomic and MRI modalities for enhanced brain cancer diagnosis. The framework fuses genomic graph embeddings, extracted using a Graph Attention Network (GAT), with MRI feature maps derived from a 3D DenseNet. The cross-modal attention fusion mechanism enables the model to capture intricate biological and spatial interactions, producing a biologically informed feature representation. Additionally, the Entmax sigmoid function is employed in the classification stage to promote sparsity and improve interpretability. Data were sourced from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA) following comprehensive preprocessing.

Extensive experiments conducted across four benchmark datasets demonstrated that MDL-CA achieved superior diagnostic performance, with accuracies of 96.22%, 97.14%, 98.46%, and 98.21%, and F1-scores ranging from 95.95% to 98.40%. These results confirm the framework’s robustness, scalability, and consistent generalization across diverse datasets.

The integration of genomic and MRI data through the proposed cross-attention mechanism enables deeper biological understanding and improved diagnostic precision compared to single-modality and conventional fusion approaches. By effectively modeling interactions between molecular and anatomical features, MDL-CA advances the development of biologically informed, multimodal diagnostic systems for brain cancer. The results highlight the framework’s potential to support early diagnosis and personalized treatment planning in clinical practice.

## Linked entities

- **Diseases:** brain cancer (MONDO:0001657)

## Full-text entities

- **Diseases:** Brain cancer (MESH:D001932), Cancer (MESH:D009369)
- **Chemicals:** MDL-CA (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12812965/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12812965/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12812965/full.md

---
Source: https://tomesphere.com/paper/PMC12812965