When Does Visual Prompting Outperform Linear Probing for Vision-Language   Models? A Likelihood Perspective

Hsi-Ai Tsao; Lei Hsiung; Pin-Yu Chen; Tsung-Yi Ho

arXiv:2409.01821·cs.CV·September 5, 2024

When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Tsung-Yi Ho

PDF

Open Access 1 Repo

TL;DR

This paper introduces a likelihood-based analysis to compare visual prompting and linear probing in vision-language models, demonstrating a cost-effective method that achieves high accuracy with significantly reduced computation.

Contribution

It proposes a log-likelihood ratio approach to evaluate and compare the effectiveness of visual prompting versus linear probing in a resource-efficient manner.

Findings

01

LLR score effectively compares transfer learning methods.

02

Visual prompting can significantly improve out-of-distribution performance.

03

Cost-effective approximations reduce runtime by up to 100-fold.

Abstract

Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. By employing the LLR score alongside resource-efficient visual prompts approximations, our cost-effective measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%. The source code is available at https://github.com/IBM/VP-LLR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/vp-llr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques