Improving CLIP Training with Language Rewrites

Lijie Fan; Dilip Krishnan; Phillip Isola; Dina Katabi; Yonglong Tian

arXiv:2305.20088·cs.CV·October 31, 2023·35 cites

Improving CLIP Training with Language Rewrites

Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LaCLIP, a method that enhances CLIP training by using language rewrites generated by large language models to diversify text descriptions, leading to significant improvements in transfer performance.

Contribution

The paper proposes a novel language augmentation technique for CLIP training using language rewrites, which improves transfer accuracy without additional computational costs.

Findings

01

LaCLIP outperforms CLIP in zero-shot ImageNet accuracy by up to 8.2%.

02

Language rewrites increase diversity of text inputs, enhancing model robustness.

03

The method requires no extra computation or memory during training.

Abstract

Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective and scalable methods for training transferable vision models using paired image and text data. CLIP models are trained using contrastive loss, which typically relies on data augmentations to prevent overfitting and shortcuts. However, in the CLIP training paradigm, data augmentations are exclusively applied to image inputs, while language inputs remain unchanged throughout the entire training process, limiting the exposure of diverse texts to the same image. In this paper, we introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites. Leveraging the in-context learning capability of large language models, we rewrite the text descriptions associated with each image. These rewritten texts exhibit diversity in sentence structure and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijiefan/laclip
pytorchOfficial

Videos

Improving CLIP Training with Language Rewrites· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training