Confidence-calibrated covariate shift correction for few-shot classification in Vision-Language Models
Behraj Khan, Rizwan Qureshi, Nouman Muhammad Durrani, Tahir Syed

TL;DR
This paper introduces CalShift, a unified method to improve vision-language models' robustness and calibration under covariate shift and confidence misalignment in low-shot learning scenarios.
Contribution
CalShift combines Fisher information penalty and confidence misalignment penalty to address covariate shift and overconfidence, enhancing model calibration and accuracy.
Findings
Achieves up to 5.82% reduction in Expected Calibration Error (ECE)
Improves accuracy by 3.5% on covariate shift benchmarks
Significantly enhances model robustness and reliability
Abstract
Since the establishment of vision-language foundation models as the new mainstay in low-shot vision classification tasks, the question of domain generalization arising from insufficient target data is assuming more importance. This scarcity challenge induces sampling bias and amplifies model sensitivity to variations and shifts in data distributions. While fine-tuning on multiple domains could mitigate such domain generalization issues, it is resource-intensive and demands diverse data sources. In this work, we systematically analyze two critical challenges: (1) covariate shift between the pre-training distribution and the underspecified target distribution, and (2) confidence misalignment, where predictions on novel data are overconfident. To address both challenges simultaneously, we introduce \textbf{Confidence-Calibrated Covariate Shift Correction (CalShift)} -- a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Remote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques
