Text-driven Multiplanar Visual Interaction for Semi-supervised Medical Image Segmentation

Kaiwen Huang; Yi Zhou; Huazhu Fu; Yizhe Zhang; Chen Gong; Tao Zhou

arXiv:2507.12382·cs.CV·July 17, 2025

Text-driven Multiplanar Visual Interaction for Semi-supervised Medical Image Segmentation

Kaiwen Huang, Yi Zhou, Huazhu Fu, Yizhe Zhang, Chen Gong, Tao Zhou

PDF

Open Access

TL;DR

This paper introduces a novel text-driven multiplanar interaction framework that leverages textual information to improve semi-supervised 3D medical image segmentation, addressing data annotation challenges.

Contribution

It proposes a new framework with three modules that enhance visual features with text, align semantic information across modalities, and reduce distribution gaps between labeled and unlabeled data.

Findings

01

Outperforms existing methods on three public datasets.

02

Effectively incorporates textual data to enhance segmentation accuracy.

03

Improves robustness of semi-supervised segmentation models.

Abstract

Semi-supervised medical image segmentation is a crucial technique for alleviating the high cost of data annotation. When labeled data is limited, textual information can provide additional context to enhance visual semantic understanding. However, research exploring the use of textual data to enhance visual semantic embeddings in 3D medical imaging tasks remains scarce. In this paper, we propose a novel text-driven multiplanar visual interaction framework for semi-supervised medical image segmentation (termed Text-SemiSeg), which consists of three main modules: Text-enhanced Multiplanar Representation (TMR), Category-aware Semantic Alignment (CSA), and Dynamic Cognitive Augmentation (DCA). Specifically, TMR facilitates text-visual interaction through planar mapping, thereby enhancing the category awareness of visual features. CSA performs cross-modal semantic alignment between the text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques