SemPT: Semantic Prompt Tuning for Vision-Language Models

Xiao Shi; Yangjun Ou; Zhenzhong Chen

arXiv:2508.10645·cs.CV·August 15, 2025

SemPT: Semantic Prompt Tuning for Vision-Language Models

Xiao Shi, Yangjun Ou, Zhenzhong Chen

PDF

TL;DR

SemPT introduces a semantic prompt tuning framework that leverages shared attribute-level knowledge and a two-step prompting strategy to improve transferability and generalization of vision-language models to unseen categories.

Contribution

The paper proposes a novel semantic prompt tuning method that enhances transferability by extracting shared attributes and aligning image and text embeddings for better unseen category recognition.

Findings

01

Achieves state-of-the-art results on 15 benchmark datasets.

02

Improves generalization to unseen categories in zero-shot and few-shot settings.

03

Effectively balances discrimination and transferability through attribute-enhanced embeddings.

Abstract

Visual transfer learning for unseen categories presents an active research topic yet a challenging task, due to the inherent conflict between preserving category-specific representations and acquiring transferable knowledge. Vision-Language Models (VLMs) pre-trained on large amounts of image-text pairs offer a promising solution. However, existing prompt tuning methods rely on sparse category labels or disparate LLM-generated descriptions, which fragment knowledge representation and hinder transferability. To address this limitation, we introduce Semantic Prompt Tuning (SemPT), a novel framework that tackles the generalization challenge by leveraging shared attribute-level knowledge across categories. Specifically, SemPT adopts a two-step prompting strategy to guide LLM in extracting shared visual attributes and generating attribute-level descriptions, capturing transferable semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.