Data Adaptive Traceback for Vision-Language Foundation Models in Image   Classification

Wenshuo Peng; Kaipeng Zhang; Yue Yang; Hao Zhang; Yu Qiao

arXiv:2407.08787·cs.CV·September 27, 2024

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao

PDF

1 Video

TL;DR

This paper introduces Data Adaptive Traceback (DAT), a novel adaptation framework for vision-language models that selectively leverages pre-training data to improve downstream image classification tasks, especially addressing weak image-text correlations.

Contribution

The paper proposes a new adaptation framework called DAT that extracts task-related data subsets and reuses pre-training images using semi-supervised and contrastive learning techniques.

Findings

01

DAT significantly improves benchmark dataset performance.

02

The approach effectively addresses weak image-text correlation issues.

03

Experimental results outperform traditional adaptation methods.

Abstract

Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are unable to mine all the knowledge from pre-training data. The existing adaptation methods do not consider the missing knowledge, which may lead to crucial task-related knowledge for the downstream tasks being ignored. To address this issue, we propose a new adaptation framework called Data Adaptive Traceback (DAT). Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data to enable the downstream tasks. Furthermore, we adopt a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification· underline

Taxonomy

MethodsContrastive Learning