LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross, Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell, Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig, Schmidt, Robert Kaczmarczyk, Jenia Jitsev

TL;DR
LAION-5B is a large, openly available dataset of 5.85 billion image-text pairs designed to facilitate research and development of next-generation large-scale multi-modal models like CLIP and DALL-E.
Contribution
The paper introduces LAION-5B, the largest open dataset of its kind, enabling broader access and experimentation with large-scale image-text models.
Findings
Successful replication of models like CLIP and Stable Diffusion using LAION-5B.
Demonstrated fine-tuning and transfer learning capabilities on the dataset.
Provided tools for dataset exploration and content detection.
Abstract
Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B -…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗jm12138/riffusion-model-v1model· ♡ 3♡ 3
- 🤗timm/vit_base_patch32_clip_224.laion2b_ft_in1kmodel· 59 dl· ♡ 159 dl♡ 1
- 🤗timm/vit_large_patch14_clip_224.laion2b_ft_in1kmodel· 840 dl840 dl
- 🤗timm/vit_huge_patch14_clip_224.laion2b_ft_in1kmodel· 460 dl460 dl
- 🤗timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1kmodel· 1.2k dl1.2k dl
- 🤗timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1kmodel· 1.3k dl· ♡ 21.3k dl♡ 2
- 🤗timm/vit_large_patch14_clip_224.laion2b_ft_in12kmodel· 73 dl73 dl
- 🤗timm/vit_huge_patch14_clip_224.laion2b_ft_in12kmodel· 59 dl· ♡ 159 dl♡ 1
- 🤗timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1kmodel· 433 dl· ♡ 2433 dl♡ 2
- 🤗timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1kmodel· 1.7k dl· ♡ 11.7k dl♡ 1
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsGuided Language to Image Diffusion for Generation and Editing · Diffusion · Contrastive Language-Image Pre-training · ALIGN
