TL;DR
This paper introduces Data Adaptive Traceback (DAT), a novel adaptation framework for vision-language models that selectively leverages pre-training data to improve downstream image classification tasks, especially addressing weak image-text correlations.
Contribution
The paper proposes a new adaptation framework called DAT that extracts task-related data subsets and reuses pre-training images using semi-supervised and contrastive learning techniques.
Findings
DAT significantly improves benchmark dataset performance.
The approach effectively addresses weak image-text correlation issues.
Experimental results outperform traditional adaptation methods.
Abstract
Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are unable to mine all the knowledge from pre-training data. The existing adaptation methods do not consider the missing knowledge, which may lead to crucial task-related knowledge for the downstream tasks being ignored. To address this issue, we propose a new adaptation framework called Data Adaptive Traceback (DAT). Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data to enable the downstream tasks. Furthermore, we adopt a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsContrastive Learning
