LOVM: Language-Only Vision Model Selection

Orr Zohar; Shih-Cheng Huang; Kuan-Chieh Wang; Serena Yeung

arXiv:2306.08893·cs.CV·June 16, 2023·1 cites

LOVM: Language-Only Vision Model Selection

Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, Serena Yeung

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LOVM, a new benchmark and task for selecting the best pre-trained vision-language models based solely on text descriptions, eliminating the need for dataset-specific evaluations.

Contribution

We propose a novel task and benchmark for zero-shot model selection using only text descriptions, enabling efficient VLM evaluation without access to downstream datasets.

Findings

01

Established the LOVM benchmark with evaluations of 35 VLMs across 23 datasets.

02

Demonstrated the effectiveness of text-based model ranking methods.

03

Provided insights into zero-shot performance prediction for VLMs.

Abstract

Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few- and zero-shot settings. However, selecting the best-performing VLM for some downstream applications is non-trivial, as it is dataset and task-dependent. Meanwhile, the exhaustive evaluation of all available VLMs on a novel application is not only time and computationally demanding but also necessitates the collection of a labeled dataset for evaluation. As the number of open-source VLM variants increases, there is a need for an efficient model selection strategy that does not require access to a curated evaluation dataset. This paper proposes a novel task and benchmark for efficiently evaluating VLMs' zero-shot performance on downstream applications without access to the downstream task dataset.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

orrzohar/lovm
pytorchOfficial

Videos

LOVM: Language-Only Vision Model Selection· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling