Filtering, Distillation, and Hard Negatives for Vision-Language   Pre-Training

Filip Radenovic; Abhimanyu Dubey; Abhishek Kadian; Todor Mihaylov,; Simon Vandenhende; Yash Patel; Yi Wen; Vignesh Ramanathan; Dhruv Mahajan

arXiv:2301.02280·cs.CV·March 31, 2023

Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training

Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov,, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan

PDF

Open Access 1 Repo

TL;DR

This paper enhances vision-language contrastive pre-training by filtering noisy data, leveraging unimodal representations, and emphasizing hard negatives, leading to significant improvements across numerous zero-shot and few-shot tasks.

Contribution

It introduces the CAT filtering strategy, Concept Distillation, and an importance-sampling method for hard negatives, advancing the state-of-the-art in contrastive vision-language models.

Findings

01

Improved performance on 20 out of 29 zero-shot tasks.

02

Significant gains in few-shot linear probing accuracy.

03

Effective reduction of dataset noise without increasing training complexity.

Abstract

Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems. In this paper we improve the following three aspects of the contrastive pre-training pipeline: dataset noise, model initialization and the training objective. First, we propose a straightforward filtering strategy titled Complexity, Action, and Text-spotting (CAT) that significantly reduces dataset size, while achieving improved performance across zero-shot vision-language tasks. Next, we propose an approach titled Concept Distillation to leverage strong unimodal representations for contrastive training that does not increase training complexity while outperforming prior work. Finally, we modify the traditional contrastive alignment objective, and propose an importance-sampling approach to up-sample the importance of hard-negatives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/diht
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContrastive Learning