LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert, Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran, Komatsuzaki

TL;DR
This paper introduces LAION-400M, a large-scale open dataset of 400 million CLIP-filtered image-text pairs, enabling training of multi-modal models from scratch and supporting efficient similarity search.
Contribution
The creation and public release of LAION-400M, the largest open dataset of its kind, with CLIP filtering, embeddings, and search tools for multi-modal learning.
Findings
Enables training of multi-modal models from scratch.
Supports zero- and few-shot learning tasks.
Provides tools for efficient similarity search.
Abstract
Multi-modal language-vision models trained on hundreds of millions of image-text pairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability to perform zero- or few-shot learning and transfer even in absence of per-sample labels on target image data. Despite this trend, to date there has been no publicly available datasets of sufficient scale for training such models from scratch. To address this issue, in a community effort we build and release for public LAION-400M, a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsContrastive Language-Image Pre-training
