Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD   Generalization

Yuhang Zang; Hanlin Goh; Josh Susskind; Chen Huang

arXiv:2401.15914·cs.CV·April 17, 2024·1 cites

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

Yuhang Zang, Hanlin Goh, Josh Susskind, Chen Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces OGEN, a novel method that uses class-conditional feature generation and self-distillation to improve out-of-distribution generalization in vision-language model finetuning, addressing overfitting to known classes.

Contribution

The paper proposes OGEN, a new approach combining feature synthesis and adaptive self-distillation to enhance OOD generalization during vision-language model finetuning.

Findings

01

OGEN improves OOD generalization performance across various settings.

02

The method effectively prevents overfitting to known classes.

03

Synthesized features aid in regularizing decision boundaries.

Abstract

Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. However, such models mainly perform zero-shot recognition in a closed-set manner, and thus struggle to handle open-domain visual concepts by design. There are recent finetuning methods, such as prompt learning, that not only study the discrimination between in-distribution (ID) and out-of-distribution (OOD) samples, but also show some improvements in both ID and OOD accuracies. In this paper, we first demonstrate that vision-language models, after long enough finetuning but without proper regularization, tend to overfit the known classes in the given dataset, with degraded performance on unknown classes. Then we propose a novel approach OGEN to address this pitfall, with the main focus on improving the OOD GENeralization of finetuned models. Specifically, a class-conditional feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple/ml-ogen
pytorchOfficial

Videos

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization· slideslive

Taxonomy

TopicsSemantic Web and Ontologies

MethodsFocus