Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

Fenghe Tang; Wenxin Ma; Zhiyang He; Xiaodong Tao; Zihang Jiang; S. Kevin Zhou

arXiv:2506.18034·cs.CV·June 24, 2025

Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

Fenghe Tang, Wenxin Ma, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou

PDF

1 Repo

TL;DR

This paper demonstrates that integrating a frozen pre-trained LLM layer into CNN-based medical image segmentation models enhances performance across multiple modalities by leveraging the semantic awareness of LLMs with minimal additional parameters.

Contribution

Introducing a simple hybrid framework that embeds a frozen pre-trained LLM into CNN segmentation models, improving accuracy and robustness across various medical imaging modalities.

Findings

01

Performance improved across ultrasound, dermoscopy, polypscopy, and CT scans.

02

Robustness validated with different LLMs like LLaMA and DeepSeek.

03

Minimal increase in trainable parameters enhances segmentation quality.

Abstract

With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fenghetan9/llm4seg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLLaMA