Learning from True-False Labels via Multi-modal Prompt Retrieving

Zhongnian Li; Jinghao Xu; Peng Ying; Meng Wei; Xinzheng Xu

arXiv:2405.15228·cs.LG·June 4, 2025

Learning from True-False Labels via Multi-modal Prompt Retrieving

Zhongnian Li, Jinghao Xu, Peng Ying, Meng Wei, Xinzheng Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new weakly supervised labeling setting called True-False Labels (TFLs) for VLMs, along with a Multi-modal Prompt Retrieving method, to improve label accuracy and task performance.

Contribution

It proposes the TFL setting and a convolutional-based MRP method, enhancing weak supervision and bridging VLM knowledge with target tasks.

Findings

01

TFL setting achieves high label accuracy with VLMs.

02

MRP method effectively utilizes multi-modal prompts.

03

Experimental results validate the approach's effectiveness.

Abstract

Pre-trained Vision-Language Models (VLMs) exhibit strong zero-shot classification abilities, demonstrating great potential for generating weakly supervised labels. Unfortunately, existing weakly supervised learning methods are short of ability in generating accurate labels via VLMs. In this paper, we propose a novel weakly supervised labeling setting, namely True-False Labels (TFLs) which can achieve high accuracy when generated by VLMs. The TFL indicates whether an instance belongs to the label, which is randomly and uniformly sampled from the candidate label set. Specifically, we theoretically derive a risk-consistent estimator to explore and utilize the conditional probability distribution information of TFLs. Besides, we propose a convolutional-based Multi-modal Prompt Retrieving (MRP) method to bridge the gap between the knowledge of VLMs and target learning tasks. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tranquilxu/tmp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling