Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification

Jianxun Yu; Ruiquan Ge; Zhipeng Wang; Cheng Yang; Chenyu Lin; Xianjun Fu; Jikui Liu; Ahmed Elazab; Changmiao Wang

arXiv:2508.04205·cs.CV·August 7, 2025

Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification

Jianxun Yu, Ruiquan Ge, Zhipeng Wang, Cheng Yang, Chenyu Lin, Xianjun Fu, Jikui Liu, Ahmed Elazab, Changmiao Wang

PDF

TL;DR

This paper introduces MMCAF-Net, a novel deep learning model that effectively fuses multimodal 3D medical imaging and health record data for lung disease classification, especially improving detection of small lesions.

Contribution

The paper proposes a new multimodal fusion network with multi-scale cross-attention and lesion-specific feature extraction for improved lung disease diagnosis.

Findings

01

Significant accuracy improvement over existing methods

02

Effective handling of dimensionality differences in multimodal data

03

Enhanced detection of small lesions in lung images

Abstract

The diagnosis of medical diseases faces challenges such as the misdiagnosis of small lesions. Deep learning, particularly multimodal approaches, has shown great potential in the field of medical disease diagnosis. However, the differences in dimensionality between medical imaging and electronic health record data present challenges for effective alignment and fusion. To address these issues, we propose the Multimodal Multiscale Cross-Attention Fusion Network (MMCAF-Net). This model employs a feature pyramid structure combined with an efficient 3D multi-scale convolutional attention module to extract lesion-specific features from 3D medical images. To further enhance multimodal data integration, MMCAF-Net incorporates a multi-scale cross-attention module, which resolves dimensional inconsistencies, enabling more effective feature fusion. We evaluated MMCAF-Net on the Lung-PET-CT-Dx…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.