Robust Calibration of Large Vision-Language Adapters

Balamurali Murugesan; Julio Silva-Rodriguez; Ismail Ben Ayed; and Jose; Dolz

arXiv:2407.13588·cs.CV·July 19, 2024

Robust Calibration of Large Vision-Language Adapters

Balamurali Murugesan, Julio Silva-Rodriguez, Ismail Ben Ayed, and Jose, Dolz

PDF

Open Access 1 Repo

TL;DR

This paper identifies and addresses the miscalibration issue in CLIP-based model adaptation, especially for out-of-distribution samples, proposing a simple, model-agnostic logit scaling method that improves calibration without sacrificing accuracy.

Contribution

It reveals the cause of miscalibration in CLIP adaptation methods and introduces a straightforward, effective logit scaling technique applicable during inference or adaptation.

Findings

01

Miscalibration worsens with distributional drift in CLIP adaptation methods.

02

Scaling logits to zero-shot prediction logits mitigates miscalibration.

03

Proposed methods improve calibration across various OOD benchmarks.

Abstract

This paper addresses the critical issue of miscalibration in CLIP-based model adaptation, particularly in the challenging scenario of out-of-distribution (OOD) samples, which has been overlooked in the existing literature on CLIP adaptation. We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline in the presence of distributional drift. We identify the increase in logit ranges as the underlying cause of miscalibration of CLIP adaptation methods, contrasting with previous work on calibrating fully-supervised models. Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits. We explore three different alternatives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Bala93/CLIPCalib
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications

MethodsContrastive Language-Image Pre-training