MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

Benjamin Aubin; Gonzalo I\~naki Quintana; Onur Tasar; Sanjeev Sreetharan; Urszula Czerwinska; Damien Henry; Cl\'ement Chadebec

arXiv:2605.21272·cs.CV·May 21, 2026

MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

Benjamin Aubin, Gonzalo I\~naki Quintana, Onur Tasar, Sanjeev Sreetharan, Urszula Czerwinska, Damien Henry, Cl\'ement Chadebec

PDF

1 Datasets

TL;DR

MONET is a large, open, and richly annotated text-to-image dataset with 104.9 million pairs, designed to facilitate reproducible research in large-scale text-to-image modeling.

Contribution

The paper introduces MONET, a comprehensive, filtered, and augmented dataset for text-to-image tasks, enabling scalable and reproducible research.

Findings

01

Training on MONET yields competitive GenEval and DPG scores.

02

MONET's diverse and high-quality data supports large-scale model training.

03

The dataset accelerates research by providing pre-computed embeddings and annotations.

Abstract

Training large text-to-image models requires high-quality, curated datasets with diverse content and detailed captions. Yet the cost and complexity of collecting, filtering, deduplicating, and re-captioning such corpora at scale hinders open and reproducible research in the field. We introduce MONET, an open Apache 2.0 dataset of approx. 104.9M image--text pairs collected from 2.9B raw pairs across heterogeneous open sources through successive stages of safety filtering, domain-based filtering, exact and near-duplicate removal, and re-captioning with multiple vision-language models covering short to long-form descriptions, and further augmented with synthetically generated samples. Each image is shipped with pre-computed embeddings and annotations to accelerate downstream use. To validate the effectiveness of MONET, we train a 4B-parameter latent diffusion model exclusively on it and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

jasperai/monet
dataset· 249k dl
249k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.