LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for   Vision-Language Models

Cheng Shi; Sibei Yang

arXiv:2309.01155·cs.CV·September 25, 2023

LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models

Cheng Shi, Sibei Yang

PDF

Open Access

TL;DR

LoGoPrompt introduces synthetic text images as effective visual prompts for vision-language models, significantly improving performance across various tasks without additional trainable parameters.

Contribution

The paper proposes LoGoPrompt, a novel approach that uses synthetic text images as visual prompts, enhancing generalization and performance in vision-language tasks.

Findings

01

Outperforms state-of-the-art methods in few-shot learning

02

Enhances base-to-new generalization

03

Improves domain generalization across 16 datasets

Abstract

Prompt engineering is a powerful tool used to enhance the performance of pre-trained models on downstream tasks. For example, providing the prompt "Let's think step by step" improved GPT-3's reasoning accuracy to 63% on MutiArith while prompting "a photo of" filled with a class name enables CLIP to achieve $80$ \% zero-shot accuracy on ImageNet. While previous research has explored prompt learning for the visual modality, analyzing what constitutes a good visual prompt specifically for image recognition is limited. In addition, existing visual prompt tuning methods' generalization ability is worse than text-only prompting tuning. This paper explores our key insight: synthetic text images are good visual prompts for vision-language models! To achieve that, we propose our LoGoPrompt, which reformulates the classification objective to the visual prompt selection and addresses the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training