CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images
Xinjie Sun, Boxiong Wei, Yalong Jiang, Liquan Mao, Qi Zhao

TL;DR
CLIP-TNseg introduces a multi-modal hybrid framework combining semantic and fine-grained features for improved thyroid nodule segmentation in ultrasound images, enhancing accuracy and robustness.
Contribution
It presents a novel multi-modal framework integrating CLIP-based semantic features with U-Net style residuals for thyroid nodule segmentation.
Findings
Achieves competitive segmentation accuracy on public and new datasets.
Effectively combines high-level semantic and spatial features.
Demonstrates robustness and interpretability improvements.
Abstract
Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning. However, existing methods face challenges in segmentation accuracy, interpretability, and generalization, which hinder their performance. This letter proposes a novel framework, CLIP-TNseg, to address these issues by integrating a multimodal large model with a neural network architecture. CLIP-TNseg consists of two main branches: the Coarse-grained Branch, which extracts high-level semantic features from a frozen CLIP model, and the Fine-grained Branch, which captures fine-grained features using U-Net style residual blocks. These features are fused and processed by the prediction head to generate precise segmentation maps. CLIP-TNseg leverages the Coarse-grained Branch to enhance semantic understanding through textual and high-level visual features, while the Fine-grained Branch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Concatenated Skip Connection · U-Net · Contrastive Language-Image Pre-training
