TAP: Targeted Prompting for Task Adaptive Generation of Textual Training   Instances for Visual Classification

M. Jehanzeb Mirza; Leonid Karlinsky; Wei Lin; Horst Possegger; Rogerio; Feris; Horst Bischof

arXiv:2309.06809·cs.CV·September 14, 2023

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Horst Possegger, Rogerio, Feris, Horst Bischof

PDF

Open Access 1 Repo

TL;DR

This paper introduces TAP, a targeted prompting method that enhances text-only training of vision-language models for improved visual classification across various domains and tasks.

Contribution

It proposes a novel targeted prompting strategy for LLM-generated text data, significantly boosting VLM adaptation and recognition performance without paired training data.

Findings

01

Up to 8.4% improvement in domain-specific adaptation

02

Up to 8.7% improvement in fine-grained recognition

03

3.1% overall average improvement in zero-shot classification

Abstract

Vision and Language Models (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts. However, for the best visual recognition performance, these models still require tuning to better fit the data distributions of the downstream tasks, in order to overcome the domain shift from the web-based pre-training data. Recently, it has been shown that it is possible to effectively tune VLMs without any paired data, and in particular to effectively improve VLMs visual recognition performance using text-only training data generated by Large Language Models (LLMs). In this paper, we dive deeper into this exciting text-only VLM training approach and explore ways it can be significantly further improved taking the specifics of the downstream task into account when sampling text data from LLMs. In particular, compared to the SOTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hasibzunair/rsud20k
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training