Multimodal Information Interaction for Medical Image Segmentation
Xinxin Fan, Lin Liu, Haoran Zhang

TL;DR
This paper introduces MicFormer, a novel multimodal transformer architecture that effectively fuses features from different medical imaging modalities, significantly improving segmentation accuracy in multimodal medical images.
Contribution
The paper proposes MicFormer, a dual-stream cross transformer with deformable architecture, for better multimodal feature integration in medical image segmentation.
Findings
Achieved a DICE score of 85.57 on whole-heart segmentation.
Outperformed existing methods by margins of 2.83 in DICE and 4.23 in MIoU.
Demonstrated effective multimodal feature communication and fusion.
Abstract
The use of multimodal data in assisted diagnosis and segmentation has emerged as a prominent area of interest in current research. However, one of the primary challenges is how to effectively fuse multimodal features. Most of the current approaches focus on the integration of multimodal features while ignoring the correlation and consistency between different modal features, leading to the inclusion of potentially irrelevant information. To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality. Leveraging the Cross Transformer, it queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features. Additionally, we incorporate a deformable Transformer architecture to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsAttention Is All You Need · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
