Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text   Alignment

Song-Lin Lv; Yu-Yang Chen; Zhi Zhou; Yu-Feng Li; Lan-Zhe Guo

arXiv:2501.19060·cs.CV·February 6, 2025

Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment

Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo

PDF

Open Access

TL;DR

This paper introduces Contrast-Aware Calibration (CAC), a novel multimodal calibration method for fine-tuned CLIP models that improves confidence calibration on both train and unseen classes without additional training or data analysis.

Contribution

The paper proposes CAC, a contrastive difference-based calibration method that effectively calibrates both train and unseen classes in fine-tuned CLIP models, addressing limitations of existing methods.

Findings

01

CAC outperforms existing calibration methods on 11 datasets.

02

CAC maintains accuracy and inference speed while improving calibration.

03

CAC effectively calibrates both train and unseen classes in various fine-tuning scenarios.

Abstract

Vision-language models (VLMs), such as CLIP, have demonstrated exceptional generalization capabilities and can quickly adapt to downstream tasks through prompt fine-tuning. Unfortunately, in classification tasks involving non-training classes, known as open-vocabulary setting, fine-tuned VLMs often overfit to train classes, resulting in a misalignment between confidence scores and actual accuracy on unseen classes, which significantly undermines their reliability in real-world deployments. Existing confidence calibration methods typically require training parameters or analyzing features from the training dataset, restricting their ability to generalize unseen classes without corresponding train data. Moreover, VLM-specific calibration methods rely solely on text features from train classes as calibration indicators, which inherently limits their ability to calibrate train classes. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training