Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment
Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo

TL;DR
This paper introduces Contrast-Aware Calibration (CAC), a novel multimodal calibration method for fine-tuned CLIP models that improves confidence calibration on both train and unseen classes without additional training or data analysis.
Contribution
The paper proposes CAC, a contrastive difference-based calibration method that effectively calibrates both train and unseen classes in fine-tuned CLIP models, addressing limitations of existing methods.
Findings
CAC outperforms existing calibration methods on 11 datasets.
CAC maintains accuracy and inference speed while improving calibration.
CAC effectively calibrates both train and unseen classes in various fine-tuning scenarios.
Abstract
Vision-language models (VLMs), such as CLIP, have demonstrated exceptional generalization capabilities and can quickly adapt to downstream tasks through prompt fine-tuning. Unfortunately, in classification tasks involving non-training classes, known as open-vocabulary setting, fine-tuned VLMs often overfit to train classes, resulting in a misalignment between confidence scores and actual accuracy on unseen classes, which significantly undermines their reliability in real-world deployments. Existing confidence calibration methods typically require training parameters or analyzing features from the training dataset, restricting their ability to generalize unseen classes without corresponding train data. Moreover, VLM-specific calibration methods rely solely on text features from train classes as calibration indicators, which inherently limits their ability to calibrate train classes. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsContrastive Language-Image Pre-training
