Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets
Marianna Nezhurina, Tomer Porian, Giovanni Pucceti, Tommie Kerssies, Romain Beaumont, Mehdi Cherti, Jenia Jitsev

TL;DR
This paper develops scaling laws for language-vision models to enable systematic comparison of models and datasets, demonstrating MaMMUT's superior scaling and efficiency over CLIP across multiple tasks and datasets.
Contribution
It introduces the first comprehensive scaling laws for CLIP and MaMMUT models, allowing for accurate model and dataset comparison across scales and tasks.
Findings
MaMMUT shows stronger improvement with scale than CLIP.
Scaling laws are consistent across different downstream tasks.
Constant learning rate scaling laws reduce computational costs.
Abstract
In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. For the first time, full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. Ensuring sufficient prediction accuracy for held out points, we use derived scaling laws to compare both models, obtaining evidence for MaMMUT's stronger improvement with scale and better sample efficiency than standard CLIP. To strengthen validity of the comparison, we show scaling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗laion/openMaMMUT-ViT-L-14-DataComp-1.4B-s12.8B-b180Kmodel· 35 dl· ♡ 635 dl♡ 6
- 🤗laion/scaling-laws-for-comparisonmodel· ♡ 2♡ 2
- 🤗laion/openMaMMUT-ViT-B-32-512x512-pt_DFN2B-ft_DFN512x512-s293M-b73kmodel· 9 dl· ♡ 29 dl♡ 2
- 🤗laion/openMaMMUT-ViT-B-16-512x512-pt_DFN2B-ft_DFN512x512-s293M-b73kmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗laion/openMaMMUT-ViT-L-14-512x512-pt_datacomp1b-ft_DFN512x512-s293M-b32kmodel· 11 dl· ♡ 211 dl♡ 2
- 🤗laion/openMaMMUT-ViT-L-14-512x512-pt_datacomp1b-ft_datacomp512x512-s76M-b73kmodel· 11 dl· ♡ 311 dl♡ 3
- 🤗laion/openMaMMUT-ViT-B-16-512x512-pt_datacomp1b-ft_datacomp512x512-s76M-b73kmodel· 7 dl· ♡ 37 dl♡ 3
- 🤗laion/openMaMMUT-ViT-B-32-512x512-pt_datacomp1b-ft_datacomp512x512-s76M-b73kmodel· 9 dl· ♡ 39 dl♡ 3
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsContrastive Language-Image Pre-training
