SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues
Yuxin Xie, Tao Zhou, Yi Zhou, Geng Chen

TL;DR
SimTxtSeg introduces a novel weakly-supervised medical image segmentation framework that uses simple text cues to generate pseudo-labels and fuse cross-modal information, achieving state-of-the-art results.
Contribution
The paper proposes a new framework with a Textual-to-Visual Cue Converter and Text-Vision Hybrid Attention for improved weakly-supervised segmentation.
Findings
Achieves state-of-the-art performance on colonic polyp segmentation.
Achieves state-of-the-art performance on MRI brain tumor segmentation.
Effectively leverages text cues for high-quality pseudo-label generation.
Abstract
Weakly-supervised medical image segmentation is a challenging task that aims to reduce the annotation cost while keep the segmentation performance. In this paper, we present a novel framework, SimTxtSeg, that leverages simple text cues to generate high-quality pseudo-labels and study the cross-modal fusion in training segmentation models, simultaneously. Our contribution consists of two key components: an effective Textual-to-Visual Cue Converter that produces visual prompts from text prompts on medical images, and a text-guided segmentation model with Text-Vision Hybrid Attention that fuses text and image features. We evaluate our framework on two medical image segmentation tasks: colonic polyp segmentation and MRI brain tumor segmentation, and achieve consistent state-of-the-art performance. Source code is available at: https://github.com/xyx1024/SimTxtSeg.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need
