MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation
Jialin Luo, Yuanzhi Wang, Ziqi Gu, Yide Qiu, Shuaizhen Yao, Fuyun, Wang, Chunyan Xu, Wenhua Zhang, Dan Wang, Zhen Cui

TL;DR
This paper introduces MMM-RS, a comprehensive remote sensing dataset with diverse modalities, GSD, and scenes, enabling improved text-to-image generation for remote sensing applications using diffusion models.
Contribution
The paper presents a large-scale, multi-modal remote sensing dataset with standardized samples and rich text-image pairs, facilitating advanced text-to-image generation in remote sensing.
Findings
Enables diffusion models to generate diverse RS images across modalities and scenes
Contains approximately 2.1 million text-image pairs for training and evaluation
Supports various weather conditions and GSD in remote sensing image synthesis
Abstract
Recently, the diffusion-based generative paradigm has achieved impressive general image generation capabilities with text prompts due to its accurate distribution modeling and stable training process. However, generating diverse remote sensing (RS) images that are tremendously different from general images in terms of scale and perspective remains a formidable challenge due to the lack of a comprehensive remote sensing image generation dataset with various modalities, ground sample distances (GSD), and scenes. In this paper, we propose a Multi-modal, Multi-GSD, Multi-scene Remote Sensing (MMM-RS) dataset and benchmark for text-to-image generation in diverse remote sensing scenarios. Specifically, we first collect nine publicly available RS datasets and conduct standardization for all samples. To bridge RS images to textual semantic information, we utilize a large-scale pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsDiffusion
