Pre-Trained Vision-Language Models as Partial Annotators

Qian-Wei Wang; Yuqiu Xie; Letian Zhang; Zimo Liu; Shu-Tao Xia

arXiv:2406.18550·cs.CV·June 28, 2024·1 cites

Pre-Trained Vision-Language Models as Partial Annotators

Qian-Wei Wang, Yuqiu Xie, Letian Zhang, Zimo Liu, Shu-Tao Xia

PDF

Open Access

TL;DR

This paper introduces a novel weakly-supervised learning approach using pre-trained vision-language models like CLIP to generate partial labels for image classification, improving performance without extra labeling effort.

Contribution

It proposes a collaborative label purification and self-training framework leveraging noisy partial labels from CLIP, enhancing downstream image classification performance.

Findings

01

Achieves significantly better results than zero-shot inference.

02

Outperforms existing weakly supervised and few-shot methods.

03

Produces smaller, efficient deployed models.

Abstract

Pre-trained vision-language models learn massive data to model unified representations of images and natural languages, which can be widely applied to downstream machine learning tasks. In addition to zero-shot inference, in order to better adapt pre-trained models to the requirements of downstream tasks, people usually use methods such as few-shot or parameter-efficient fine-tuning and knowledge distillation. However, annotating samples is laborious, while a large number of unlabeled samples can be easily obtained. In this paper, we investigate a novel "pre-trained annotating - weakly-supervised learning" paradigm for pre-trained model application and experiment on image classification tasks. Specifically, based on CLIP, we annotate image samples with multiple prompt templates to obtain multiple candidate labels to form the noisy partial label dataset, and design a collaborative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training · Contrastive Learning