Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation

Kun Ding; Qiang Yu; Haojian Zhang; Gaofeng Meng; Shiming; Xiang

arXiv:2410.08895·cs.CV·October 14, 2024

Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation

Kun Ding, Qiang Yu, Haojian Zhang, Gaofeng Meng, Shiming, Xiang

PDF

Open Access

TL;DR

This paper introduces a calibrated cache model for few-shot vision-language adaptation, incorporating similarity, weight, and confidence calibrations to improve accuracy and reliability over existing methods.

Contribution

The work proposes novel calibration modules and variants that enhance cache-based VLM adaptation by addressing similarity, relational, and confidence issues, achieving state-of-the-art results.

Findings

01

Achieves state-of-the-art performance on 11 few-shot classification datasets.

02

Effectively models training sample relations with Gaussian Process regression.

03

Improves confidence estimation to enhance prediction reliability.

Abstract

Cache-based approaches stand out as both effective and efficient for adapting vision-language models (VLMs). Nonetheless, the existing cache model overlooks three crucial aspects. 1) Pre-trained VLMs are mainly optimized for image-text similarity, neglecting the importance of image-image similarity, leading to a gap between pre-training and adaptation. 2) The current cache model is based on the Nadaraya-Watson (N-W) estimator, which disregards the intricate relationships among training samples while constructing weight function. 3) Under the condition of limited samples, the logits generated by cache model are of high uncertainty, directly using these logits without accounting for the confidence could be problematic. This work presents three calibration modules aimed at addressing the above challenges. Similarity Calibration refines the image-image similarity by using unlabeled images.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · COVID-19 diagnosis using AI

MethodsResidual Connection · Gaussian Process · Contrastive Language-Image Pre-training