# IDF-Net: Interpretable Dynamic Fusion Network for Colorectal Cancer Diagnosis Using Cross-Modal Imaging

**Authors:** Helen Haile Hayeso, Peifeng Shi, Jingwen Lian, Zenebe Markos Lonseko, Nini Rao

PMC · DOI: 10.3390/diagnostics16010099 · Diagnostics · 2025-12-27

## TL;DR

IDF-Net is a new AI model that improves colorectal cancer diagnosis by combining multiple imaging types and providing interpretable results.

## Contribution

IDF-Net introduces a dynamically fused, multimodal diagnostic framework with integrated quantitative interpretability.

## Key findings

- IDF-Net achieved a state-of-the-art accuracy of 0.920 and AUC of 0.991 for CRC diagnosis.
- Interpretability analysis showed strong alignment between heatmaps and expert-annotated lesions.
- Dynamic routing and cross-attention fusion improved AUC by 0.038 and 0.046, respectively.

## Abstract

Background/Objectives: Colorectal cancer (CRC) is a leading cause of cancer deaths worldwide, underscoring the need for diagnostic tools that early, accurate, and clinically interpretable. Current artificial intelligence (AI) models are predominantly unimodal and lack sufficient interpretability, which restricts their clinical adoption. Methods: We propose IDF-Net, an interpretable dynamic fusion framework that integrates endoscopy, computed tomography (CT), and histopathology using modality-specific encoders, a dual-stage adaptive gating mechanism, and cross-modal attention. We conducted stratified 5-fold cross-validation and assessed interpretability using spatial heatmaps and modality attribution. We also quantified the results using the intersection-over-union metric for saliency alignment. Results: IDF-Net achieved a state-of-the-art accuracy of 0.920 (0.907–0.936) and area under the curve (AUC) of 0.991 (95% CI: 0.965–0.997), significantly outperforming unimodal and static-fusion baselines (p < 0.05). Interpretability analysis of IDF-Net demonstrated a strong alignment between Gradient-weighted Class Activation Mapping++ heatmaps and expert-annotated lesions, as well as case-specific modality contributions via SHapley Additive exPlanations values. Ablation studies confirmed the contribution of each component, with dynamic routing and cross-attention fusion improving AUC by 0.038 and 0.046, respectively. Conclusions: IDF-Net introduces a dynamically fused, multimodal diagnostic framework with integrated quantitative interpretability, demonstrating superior accuracy and strong potential for clinical translation in CRC diagnosis. The model’s adaptive design allows it to function robustly even when CT data is unavailable, aligning with common clinical pathways while leveraging additional imaging when present for comprehensive staging.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** CRC (MESH:D015179), cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12786013/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12786013/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12786013/full.md

---
Source: https://tomesphere.com/paper/PMC12786013