Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation

Mingda Zhang

arXiv:2507.09966·eess.IV·October 21, 2025

Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation

Mingda Zhang

PDF

Open Access

TL;DR

This paper introduces a three-tier multimodal fusion architecture for brain tumor segmentation that integrates physical modeling, Transformer-based feature fusion, and semantic guidance from GPT-4V, achieving state-of-the-art accuracy on multiple datasets.

Contribution

The novel three-tier fusion framework combines physical data augmentation, multi-modal feature fusion, and semantic guidance, advancing brain tumor segmentation accuracy and boundary localization.

Findings

01

Achieved Dice scores of 0.8665, 0.9014, and 0.8912 on BraTS 2020, 2021, and 2023 datasets.

02

Reduced Hausdorff Distance by an average of 6.57 mm compared to baseline.

03

Validated effectiveness across multiple datasets and modalities.

Abstract

Accurate brain tumor segmentation is crucial for neuro-oncology diagnosis and treatment planning. Deep learning methods have made significant progress, but automatic segmentation still faces challenges, including tumor morphological heterogeneity and complex three-dimensional spatial relationships. This paper proposes a three-tier fusion architecture that achieves precise brain tumor segmentation. The method processes information progressively at the pixel, feature, and semantic levels. At the pixel level, physical modeling extends magnetic resonance imaging (MRI) to multimodal data, including simulated ultrasound and synthetic computed tomography (CT). At the feature level, the method performs Transformer-based cross-modal feature fusion through multi-teacher collaborative distillation, integrating three expert teachers (MRI, US, CT). At the semantic level, clinical textual knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Medical Image Segmentation Techniques