SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters

Shayan Jalilian; Abdul Bais

arXiv:2508.00213·cs.CV·August 4, 2025

SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters

Shayan Jalilian, Abdul Bais

PDF

Open Access

TL;DR

This paper presents SAM-PTx, a parameter-efficient method for adapting the Segment Anything Model (SAM) with text embeddings, enabling semantics-guided segmentation without retraining the entire model.

Contribution

Introducing a lightweight Parallel-Text adapter that injects frozen CLIP text embeddings into SAM's encoder for improved semantic segmentation.

Findings

01

Text-guided adaptation improves segmentation accuracy.

02

SAM-PTx outperforms spatial prompt baselines on COD10K.

03

First use of text prompts for segmentation on COD10K.

Abstract

The Segment Anything Model (SAM) has demonstrated impressive generalization in prompt-based segmentation. Yet, the potential of semantic text prompts remains underexplored compared to traditional spatial prompts like points and boxes. This paper introduces SAM-PTx, a parameter-efficient approach for adapting SAM using frozen CLIP-derived text embeddings as class-level semantic guidance. Specifically, we propose a lightweight adapter design called Parallel-Text that injects text embeddings into SAM's image encoder, enabling semantics-guided segmentation while keeping most of the original architecture frozen. Our adapter modifies only the MLP-parallel branch of each transformer block, preserving the attention pathway for spatial reasoning. Through supervised experiments and ablations on the COD10K dataset as well as low-data subsets of COCO and ADE20K, we show that incorporating fixed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis