Supervision Exists Everywhere: A Data Efficient Contrastive   Language-Image Pre-training Paradigm

Yangguang Li; Feng Liang; Lichen Zhao; Yufeng Cui; Wanli Ouyang; Jing; Shao; Fengwei Yu; Junjie Yan

arXiv:2110.05208·cs.CV·March 15, 2022·127 cites

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing, Shao, Fengwei Yu, Junjie Yan

PDF

Open Access 4 Repos 1 Video

TL;DR

DeCLIP introduces a data-efficient contrastive learning paradigm that leverages multiple supervision types to improve visual feature learning, achieving comparable or better performance than CLIP with significantly less data.

Contribution

The paper proposes DeCLIP, a novel training paradigm that fully exploits supervision signals within and across modalities to reduce data requirements for effective contrastive pre-training.

Findings

01

DeCLIP-ResNet50 achieves 60.4% zero-shot ImageNet accuracy, surpassing CLIP-ResNet50.

02

DeCLIP requires 7.1 times less data than CLIP for similar performance.

03

Outperforms in 8 out of 11 downstream visual tasks.

Abstract

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks. However, CLIP is quite data-hungry and requires 400M image-text pairs for pre-training, thereby restricting its adoption. This work proposes a novel training paradigm, Data efficient CLIP (DeCLIP), to alleviate this limitation. We demonstrate that by carefully utilizing the widespread supervision among the image-text pairs, our De-CLIP can learn generic visual features more efficiently. Instead of using the single image-text contrastive supervision, we fully exploit data potential through the use of (1) self-supervision within each modality; (2) multi-view supervision across modalities; (3) nearest-neighbor supervision from other similar pairs. Benefiting from intrinsic supervision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsContrastive Language-Image Pre-training