Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti; Romain Beaumont; Ross Wightman; Mitchell Wortsman,; Gabriel Ilharco; Cade Gordon; Christoph Schuhmann; Ludwig Schmidt; Jenia; Jitsev

arXiv:2212.07143·cs.LG·July 16, 2024·29 cites

Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman,, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia, Jitsev

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper investigates how contrastive language-image models scale with data and model size using public datasets, revealing key factors affecting performance and providing open-source tools for reproducibility.

Contribution

It presents the first large-scale study of scaling laws for CLIP models trained on public data, highlighting the impact of training distribution and providing open-source resources.

Findings

01

Power law scaling observed across multiple tasks

02

Training distribution significantly affects scaling behavior

03

Open-source models and evaluation workflow provided

Abstract

Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale experiments are becoming increasingly expensive. However, previous work on scaling laws has primarily used private data \& models or focused on uni-modal language or vision learning. To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository. Our large-scale experiments involve models trained on up to two billion image-text pairs and identify power law scaling for multiple downstream tasks including zero-shot classification, retrieval, linear probing, and end-to-end fine-tuning. We find that the training distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsContrastive Language-Image Pre-training