Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Changdae Oh; Hyesu Lim; Mijoo Kim; Dongyoon Han; Sangdoo Yun; Jaegul; Choo; Alexander Hauptmann; Zhi-Qi Cheng; Kyungwoo Song

arXiv:2311.01723·cs.CV·November 8, 2024·1 cites

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Changdae Oh, Hyesu Lim, Mijoo Kim, Dongyoon Han, Sangdoo Yun, Jaegul, Choo, Alexander Hauptmann, Zhi-Qi Cheng, Kyungwoo Song

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a robust fine-tuning method for vision-language models that enhances out-of-distribution accuracy and confidence calibration by leveraging a novel theoretical insight and a constrained contrastive loss.

Contribution

It presents a new framework that improves OOD performance and calibration by enforcing a larger smallest singular value during fine-tuning, guided by self-distillation.

Findings

01

Improved OOD accuracy on ImageNet benchmarks.

02

Enhanced confidence calibration in vision-language models.

03

Theoretical bounds linking calibration errors and data covariance.

Abstract

Improving out-of-distribution (OOD) generalization during in-distribution (ID) adaptation is a primary goal of robust fine-tuning of zero-shot models beyond naive fine-tuning. However, despite decent OOD generalization performance from recent robust fine-tuning methods, confidence calibration for reliable model output has not been fully addressed. This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. Firstly, we show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data: 1) ID calibration error and 2) the smallest singular value of the ID input covariance matrix. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MLAI-Yonsei/CaRot
pytorchOfficial

Videos

Towards Calibrated Robust Fine-Tuning of Vision-Language Models· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI