ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Uddeshya Upadhyay; Shyamgopal Karthik; Massimiliano Mancini; Zeynep; Akata

arXiv:2307.00398·cs.CV·October 2, 2023

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep, Akata

PDF

Open Access 1 Repo

TL;DR

ProbVLM introduces a probabilistic approach to estimate uncertainty in embeddings of pre-trained vision-language models, improving retrieval, active learning, and model selection without large datasets or additional training.

Contribution

It presents ProbVLM, a novel post-hoc probabilistic adapter that captures embedding uncertainties in VLMs, enhancing their interpretability and downstream task performance.

Findings

01

ProbVLM outperforms existing methods in uncertainty estimation across four datasets.

02

Uncertainty estimates improve retrieval accuracy and model selection.

03

Visualization of embedding distributions is enabled using a latent diffusion model.

Abstract

Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

explainableml/probvlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsBLIP: Bootstrapping Language-Image Pre-training · Diffusion · Adapter · Contrastive Language-Image Pre-training