Vision-Language Models are Strong Noisy Label Detectors

Tong Wei; Hao-Tian Li; Chun-Shu Li; Jiang-Xin Shi; Yu-Feng; Li; Min-Ling Zhang

arXiv:2409.19696·cs.LG·October 1, 2024·3 cites

Vision-Language Models are Strong Noisy Label Detectors

Tong Wei, Hao-Tian Li, Chun-Shu Li, Jiang-Xin Shi, Yu-Feng, Li, Min-Ling Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DeFT, a framework that leverages vision-language models' alignment capabilities to effectively detect and handle noisy labels during fine-tuning, improving downstream task performance.

Contribution

DeFT is a novel framework that uses textual prompts and parameter-efficient fine-tuning to identify noisy labels in vision-language models.

Findings

01

DeFT achieves high accuracy in noisy label detection.

02

DeFT improves image classification performance on noisy datasets.

03

DeFT is adaptable to various pre-trained models and datasets.

Abstract

Recent research on fine-tuning vision-language models has demonstrated impressive performance in various downstream tasks. However, the challenge of obtaining accurately labeled data in real-world applications poses a significant obstacle during the fine-tuning process. To address this challenge, this paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels. The proposed framework establishes a noisy label detector by learning positive and negative textual prompts for each class. The positive prompt seeks to reveal distinctive features of the class, while the negative prompt serves as a learnable threshold for separating clean and noisy samples. We employ parameter-efficient fine-tuning for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HotanLee/DeFT
pytorchOfficial

Videos

Vision-Language Models are Strong Noisy Label Detectors· slideslive

Taxonomy

TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Handwritten Text Recognition Techniques