TG-LMM: Enhancing Medical Image Segmentation Accuracy through   Text-Guided Large Multi-Modal Model

Yihao Zhao; Enhao Zhong; Cuiyun Yuan; Yang Li; Man Zhao; Chunxia Li,; Jun Hu; Chenbin Liu

arXiv:2409.03412·cs.CV·September 6, 2024

TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

Yihao Zhao, Enhao Zhong, Cuiyun Yuan, Yang Li, Man Zhao, Chunxia Li,, Jun Hu, Chenbin Liu

PDF

Open Access

TL;DR

TG-LMM introduces a novel text-guided multi-modal approach that incorporates expert descriptions to significantly improve medical image segmentation accuracy, leveraging pre-trained encoders and advanced fusion techniques.

Contribution

The paper presents a new method that effectively integrates textual prior knowledge into medical image segmentation using pre-trained models and a specialized fusion structure.

Findings

01

Outperforms existing methods like MedSAM, SAM, and nnUnet.

02

Achieves higher segmentation accuracy across multiple medical datasets.

03

Reduces training parameters and accelerates training process.

Abstract

We propose TG-LMM (Text-Guided Large Multi-Modal Model), a novel approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: current medical automatic segmentation models do not effectively utilize prior knowledge, such as descriptions of organ locations; previous text-visual models focus on identifying the target rather than improving the segmentation accuracy; prior models attempt to use prior knowledge to enhance accuracy but do not incorporate pre-trained models. To address these issues, TG-LMM integrates prior knowledge, specifically expert descriptions of the spatial locations of organs, into the segmentation process. Our model utilizes pre-trained image and text encoders to reduce the number of training parameters and accelerate the training process. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging

MethodsSegment Anything Model · Focus