Rethinking Prompting Strategies for Multi-Label Recognition with Partial   Annotations

Samyak Rawlekar; Shubhang Bhatnagar; Narendra Ahuja

arXiv:2409.08381·cs.CV·September 16, 2024

Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations

Samyak Rawlekar, Shubhang Bhatnagar, Narendra Ahuja

PDF

Open Access

TL;DR

This paper investigates prompt-learning strategies for multi-label recognition with partial annotations, revealing that focusing on positive prompts and learned negative embeddings enhances performance over dual prompt approaches.

Contribution

It introduces PositiveCoOp and NegativeCoOp, demonstrating that learning only positive prompts and replacing negative prompts with embeddings improves multi-label recognition.

Findings

01

Negative prompts degrade performance in partial annotation settings.

02

Learning only positive prompts with negative embeddings outperforms dual prompt methods.

03

Baseline vision features perform comparably to dual prompt approaches when label missing is low.

Abstract

Vision-language models (VLMs) like CLIP have been adapted for Multi-Label Recognition (MLR) with partial annotations by leveraging prompt-learning, where positive and negative prompts are learned for each class to associate their embeddings with class presence or absence in the shared vision-text feature space. While this approach improves MLR performance by relying on VLM priors, we hypothesize that learning negative prompts may be suboptimal, as the datasets used to train VLMs lack image-caption pairs explicitly focusing on class absence. To analyze the impact of positive and negative prompt learning on MLR, we introduce PositiveCoOp and NegativeCoOp, where only one prompt is learned with VLM guidance while the other is replaced by an embedding vector learned directly in the shared feature space without relying on the text encoder. Through empirical analysis, we observe that negative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies

MethodsContrastive Language-Image Pre-training