Text-guided multi-stage cross-perception network for medical image segmentation
Gaoyu Chen, Haixia Pan

TL;DR
This paper introduces TMC, a novel multi-stage cross-perception network that leverages text prompts to improve medical image segmentation by enhancing cross-modal interaction and semantic understanding.
Contribution
The paper proposes a new TMC model with a Multi-stage Cross-attention Module and Alignment Loss to better utilize text prompts for medical image segmentation.
Findings
TMC achieves higher Dice scores than baseline methods on three datasets.
The Multi-stage Cross-attention Module improves semantic feature extraction.
Experimental results demonstrate the effectiveness of the proposed approach.
Abstract
Medical image segmentation plays a crucial role in clinical medicine, serving as a key tool for auxiliary diagnosis, treatment planning, and disease monitoring. However, traditional segmentation methods such as U-Net are often limited by weak semantic expression of target regions, which stems from insufficient generalization and a lack of interactivity. Incorporating text prompts offers a promising avenue to more accurately pinpoint lesion locations, yet existing text-guided methods are still hindered by insufficient cross-modal interaction and inadequate cross-modal feature representation. To address these challenges, we propose the Text-guided Multi-stage Cross-perception network (TMC). TMC incorporates a Multi-stage Cross-attention Module (MCM) to enhance the model's understanding of fine-grained semantic details and a Multi-stage Alignment Loss (MA Loss) to improve the consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConcatenated Skip Connection · Softmax
