LAION-5B: An open large-scale dataset for training next generation   image-text models

Christoph Schuhmann; Romain Beaumont; Richard Vencu; Cade Gordon; Ross; Wightman; Mehdi Cherti; Theo Coombes; Aarush Katta; Clayton Mullis; Mitchell; Wortsman; Patrick Schramowski; Srivatsa Kundurthy; Katherine Crowson; Ludwig; Schmidt; Robert Kaczmarczyk; Jenia Jitsev

arXiv:2210.08402·cs.CV·October 18, 2022·1.0k cites

LAION-5B: An open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross, Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell, Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig, Schmidt, Robert Kaczmarczyk, Jenia Jitsev

PDF

Open Access 5 Repos 10 Models 1 Video

TL;DR

LAION-5B is a large, openly available dataset of 5.85 billion image-text pairs designed to facilitate research and development of next-generation large-scale multi-modal models like CLIP and DALL-E.

Contribution

The paper introduces LAION-5B, the largest open dataset of its kind, enabling broader access and experimentation with large-scale image-text models.

Findings

01

Successful replication of models like CLIP and Stable Diffusion using LAION-5B.

02

Demonstrated fine-tuning and transfer learning capabilities on the dataset.

03

Provided tools for dataset exploration and content detection.

Abstract

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B -…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

LAION-5B: An open large-scale dataset for training next generation image-text models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling

MethodsGuided Language to Image Diffusion for Generation and Editing · Diffusion · Contrastive Language-Image Pre-training · ALIGN