Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language   Model's Prediction Rationality

Qitong Wang; Tang Li; Kien X. Nguyen; Xi Peng

arXiv:2412.13333·cs.LG·February 26, 2025

Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality

Qitong Wang, Tang Li, Kien X. Nguyen, Xi Peng

PDF

Open Access 1 Repo

TL;DR

This paper investigates how fine-tuning vision-language models affects their prediction rationality, revealing that while fine-tuning can improve correctness when valid evidence is used, it may also increase reliance on invalid evidence, impacting trustworthiness.

Contribution

The study introduces two new metrics for assessing prediction rationality and provides extensive experimental analysis on the effects of fine-tuning VLMs in safety-critical contexts.

Findings

01

Fine-tuning can lead to more correct predictions based on invalid evidence.

02

Fine-tuned VLMs are more likely to make correct predictions when valid evidence is present.

03

Results are consistent across different settings and distributional shifts.

Abstract

Vision-Language Models (VLMs), such as CLIP, have already seen widespread applications. Researchers actively engage in further fine-tuning VLMs in safety-critical domains. In these domains, prediction rationality is crucial: the prediction should be correct and based on valid evidence. Yet, for VLMs, the impact of fine-tuning on prediction rationality is seldomly investigated. To study this problem, we proposed two new metrics called Prediction Trustworthiness and Inference Reliability. We conducted extensive experiments on various settings and observed some interesting phenomena. On the one hand, we found that the well-adopted fine-tuning methods led to more correct predictions based on invalid evidence. This potentially undermines the trustworthiness of correct predictions from fine-tuned VLMs. On the other hand, having identified valid evidence of target objects, fine-tuned VLMs were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deep-real/vlm-pred-rationality
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Natural Language Processing Techniques · Speech and dialogue systems

MethodsContrastive Language-Image Pre-training