Enabling Calibration In The Zero-Shot Inference of Large Vision-Language   Models

Will LeVine; Benjamin Pikus; Pranav Raja; and Fernando Amat Gil

arXiv:2303.12748·cs.CV·April 20, 2023·1 cites

Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models

Will LeVine, Benjamin Pikus, Pranav Raja, and Fernando Amat Gil

PDF

Open Access

TL;DR

This paper investigates the calibration issues of large vision-language models like CLIP in zero-shot inference and proposes a modified temperature scaling method to improve their calibration consistency.

Contribution

It provides the first comprehensive analysis of calibration in zero-shot vision-language models and introduces a tailored temperature scaling approach for better calibration.

Findings

01

Zero-shot CLIP models are miscalibrated across prompts and datasets.

02

A single learned temperature improves calibration for specific CLIP models.

03

Calibration generalizes across datasets and prompts with the proposed method.

Abstract

Calibration of deep learning models is crucial to their trustworthiness and safe usage, and as such, has been extensively studied in supervised classification models, with methods crafted to decrease miscalibration. However, there has yet to be a comprehensive study of the calibration of vision-language models that are used for zero-shot inference, like CLIP. We measure calibration across relevant variables like prompt, dataset, and architecture, and find that zero-shot inference with CLIP is miscalibrated. Furthermore, we propose a modified version of temperature scaling that is aligned with the common use cases of CLIP as a zero-shot inference model, and show that a single learned temperature generalizes for each specific CLIP model (defined by a chosen pre-training dataset and architecture) across inference dataset and prompt choice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training