# Prompt-Driven Multimodal Segmentation with Dynamic Fusion for Adaptive and Robust Medical Imaging with Applications to Cancer Diagnosis

**Authors:** Shatha Abed Alsaedi, Hossam Magdy Balaha, Mohamed Farsi, Majed Alwateer, Moustafa M. Aboelnaga, Mohamed Shehata, Mahmoud Badawy, Mostafa A. Elhosseini

PMC · DOI: 10.3390/cancers17223691 · Cancers · 2025-11-18

## TL;DR

This paper introduces a flexible AI system for medical imaging that adapts to natural language instructions and improves cancer diagnosis by dynamically fusing text and images.

## Contribution

The novel contribution is a prompt-driven AI system that dynamically adapts to different imaging types and clinical instructions without retraining.

## Key findings

- Dice loss is optimal for single-organ segmentation tasks.
- Jaccard (IoU) loss performs better in multi-organ and cross-modality cancer segmentation.
- The proposed framework improves adaptability and robustness in medical image segmentation.

## Abstract

Accurate segmentation of tumors and organs from medical images is vital for cancer diagnosis and treatment, yet current AI tools often fail when applied across different imaging types (e.g., CT and MRI) or when multiple organs must be segmented simultaneously. This study introduces a flexible AI system that listens to natural language instructions (such as “segment the meningioma”) and adapts its behavior accordingly, without retraining. By dynamically fusing textual and imaging data, the model achieves high accuracy in both single- and multi-organ tasks. Importantly, the study shows that the best mathematical objective (loss function) depends on the task: Dice works best for single tumors, while Jaccard (IoU) is superior for complex, multi-organ cases. These insights help bridge the gap between AI research and real-world clinical use.

Background/Objectives: Medical image segmentation is a crucial task for diagnosis, treatment planning, and monitoring of cancer; however, it remains one of the toughest nuts to crack for Artificial Intelligence (AI)-based clinical applications. Deep-learning models have shown near-perfect results for narrow tasks such as single-organ Computed Tomography (CT) segmentation. Still, they fail to deliver under practicality, in which cross-modality robustness and multi-organ delineation are essential (e.g., liver Dice dropping to 0.88 ± 0.15 in combined CT-MR scenarios). That fragility exposes two structural gaps: (i) rigid task-specific architectures, which are not flexible enough to adapt to various clinical instructions, and (ii) the assumption that a universal loss function is best in all cancer imaging applications. Methods: A novel multimodal segmentation framework is proposed that combines natural language prompts and high-fidelity imaging features through Feature-wise Linear Modulation (FiLM) and Conditional Batch Normalization, enabling a single model to adapt dynamically across modalities, organs, and pathologies. Unlike preceding systems, the proposed approach is prompt-driven, context-aware, and end-to-end trainable to ensure alignment between computational adaptability and clinical decision-making. Results: Extensive evaluation on the Brain Tumor Dataset (cancer-relevant neuroimaging) and the CHAOS multi-organ challenge demonstrates two key insights: (1) while Dice loss remains optimal for single-organ tasks, (2) Jaccard (IoU) loss outperforms when multi-organ, cross-modality divides cancer segmentation boundaries. Empirical evidence has thus been offered that optimality of a loss function is task- and context-dependent and not universal. Conclusions: The design framework’s principles directly address what is documented in workflow requirements and display capabilities that may connect algorithmic innovation with clinical utility once validated through prospective clinical trials.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Diseases:** Brain Tumor (MESH:D001932), Cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12651689/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12651689/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12651689/full.md

---
Source: https://tomesphere.com/paper/PMC12651689