TL;DR
This paper demonstrates that integrating a frozen pre-trained LLM layer into CNN-based medical image segmentation models enhances performance across multiple modalities by leveraging the semantic awareness of LLMs with minimal additional parameters.
Contribution
Introducing a simple hybrid framework that embeds a frozen pre-trained LLM into CNN segmentation models, improving accuracy and robustness across various medical imaging modalities.
Findings
Performance improved across ultrasound, dermoscopy, polypscopy, and CT scans.
Robustness validated with different LLMs like LLaMA and DeepSeek.
Minimal increase in trainable parameters enhances segmentation quality.
Abstract
With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLLaMA
