DiffusionSat: A Generative Foundation Model for Satellite Imagery
Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach,, Marshall Burke, David Lobell, Stefano Ermon

TL;DR
DiffusionSat is the first large-scale generative foundation model tailored for satellite imagery, capable of realistic, multi-task generation including superresolution, in-painting, and temporal synthesis, leveraging metadata for conditioning.
Contribution
The paper introduces DiffusionSat, the largest generative model for satellite images, incorporating metadata for conditioning and supporting multiple generation tasks, outperforming previous methods.
Findings
Outperforms previous state-of-the-art satellite image generation methods.
Supports diverse tasks like superresolution, in-painting, and temporal generation.
First large-scale generative foundation model for satellite imagery.
Abstract
Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications including environmental monitoring and crop-yield prediction. Satellite images are significantly different from natural images -- they can be multi-spectral, irregularly sampled across time -- and existing diffusion models trained on images from the Web do not support them. Furthermore, remote sensing data is inherently spatio-temporal, requiring conditional generation tasks not supported by traditional methods based on captions or images. In this paper, we present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets. As text-based captions are sparsely…
Peer Reviews
Decision·ICLR 2024 poster
**(S1):** this work presents a novel diffusion-based approach for remote sensing data. It is great to see people extending diffusion to remote sensing data as it is of complex nature given its multi-spectral composition. **(S2):** this work outlines multiple generative downstream tasks for remote sensing which do go beyond "simple" image generation. This is important because in remote sensing we have no shortage of data and therefore actually not much demand for "simple" image generation.
**(W1)**: the presented work is very domain specific i.e., remote sensing data. It would be interesting to see if this approach is able to generalize to other datasets of similar multi-spectral data. **(W2)**: I am confused by the "4. Experiment" section. It is not always straightforward to link the tables and images to the different generative downstream tasks presented. And it seems that some results are missing (i.e., the In-painting section/paragraph is missing entirely)? This point also go
This work proposes a generative foundation model for remote sensing data based on StableDiffusion. The proposed foundation model produces realistic samples and can be used to solve multiple generative tasks including temporal generation, multi-spectral superrresolution and in-painting.
1. The necessity and motivation of designing the generative foundation model are not clear and convincing. 2. The methodology of training the proposed foundation model is not novel since the whole framework is a combination of stable diffusion and ControlNet.
* This paper tackles an important problem with understady on the computer vision field with impotant possitive societal benefiting applications * Novel incorporation of additional metadata and problem setup and can spin off a new line of work in geospatial ML * The generated dataset used for this study can be very useful in other applications.
* **Results for certain tasks are missing or incomplete.** The paper mentions that they show state-of-the-art results for super-resolution, temporal generation, and in-painting. However, only a single qualitative example is provided as result. Also multiple other relevant approaches have been proposed. Superresolution results just compare the proposed approach with Stable Diffusion baseline but ignores the line of work done in the field including [1,2] and others. * **The paper write up needs wo
Code & Models
Videos
Taxonomy
TopicsComputational and Text Analysis Methods · Computational Physics and Python Applications · Music and Audio Processing
MethodsDiffusion
