LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Christoph Schuhmann; Richard Vencu; Romain Beaumont; Robert; Kaczmarczyk; Clayton Mullis; Aarush Katta; Theo Coombes; Jenia Jitsev; Aran; Komatsuzaki

arXiv:2111.02114·cs.CV·November 4, 2021·367 cites

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert, Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran, Komatsuzaki

PDF

Open Access 3 Repos 2 Models 2 Datasets

TL;DR

This paper introduces LAION-400M, a large-scale open dataset of 400 million CLIP-filtered image-text pairs, enabling training of multi-modal models from scratch and supporting efficient similarity search.

Contribution

The creation and public release of LAION-400M, the largest open dataset of its kind, with CLIP filtering, embeddings, and search tools for multi-modal learning.

Findings

01

Enables training of multi-modal models from scratch.

02

Supports zero- and few-shot learning tasks.

03

Provides tools for efficient similarity search.

Abstract

Multi-modal language-vision models trained on hundreds of millions of image-text pairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability to perform zero- or few-shot learning and transfer even in absence of per-sample labels on target image data. Despite this trend, to date there has been no publicly available datasets of sufficient scale for training such models from scratch. To address this issue, in a community effort we build and release for public LAION-400M, a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training