SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters
Shayan Jalilian, Abdul Bais

TL;DR
This paper presents SAM-PTx, a parameter-efficient method for adapting the Segment Anything Model (SAM) with text embeddings, enabling semantics-guided segmentation without retraining the entire model.
Contribution
Introducing a lightweight Parallel-Text adapter that injects frozen CLIP text embeddings into SAM's encoder for improved semantic segmentation.
Findings
Text-guided adaptation improves segmentation accuracy.
SAM-PTx outperforms spatial prompt baselines on COD10K.
First use of text prompts for segmentation on COD10K.
Abstract
The Segment Anything Model (SAM) has demonstrated impressive generalization in prompt-based segmentation. Yet, the potential of semantic text prompts remains underexplored compared to traditional spatial prompts like points and boxes. This paper introduces SAM-PTx, a parameter-efficient approach for adapting SAM using frozen CLIP-derived text embeddings as class-level semantic guidance. Specifically, we propose a lightweight adapter design called Parallel-Text that injects text embeddings into SAM's image encoder, enabling semantics-guided segmentation while keeping most of the original architecture frozen. Our adapter modifies only the MLP-parallel branch of each transformer block, preserving the attention pathway for spatial reasoning. Through supervised experiments and ablations on the COD10K dataset as well as low-data subsets of COCO and ADE20K, we show that incorporating fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
