TL;DR
This paper demonstrates how a diffusion-based generative model can synthesize realistic wildfire satellite images conditioned on burn masks, aiding data augmentation for wildfire detection.
Contribution
It introduces EarthSynth, a diffusion model capable of generating and inpainting wildfire imagery conditioned on burn masks without retraining.
Findings
Inpainting pipelines outperform full-tile generation across metrics.
Structured inpainting prompt achieves best spatial alignment and burn saliency.
VLM-assisted inpainting is competitive with hand-crafted prompts.
Abstract
The scarcity of labeled satellite imagery remains a fundamental bottleneck for deep-learning (DL)-based wildfire monitoring systems. This paper investigates whether a diffusion-based foundation model for Earth Observation (EO), EarthSynth, can synthesize realistic post-wildfire Sentinel-2 RGB imagery conditioned on existing burn masks, without task-specific retraining. Using burn masks derived from the CalFireSeg-50 dataset (Martin et al., 2025), we design and evaluate six controlled experimental configurations that systematically vary: (i) pipeline architecture (mask-only full generation vs. inpainting with pre-fire context), (ii) prompt engineering strategy (three hand-crafted prompts and a VLM-generated prompt via Qwen2-VL), and (iii) a region-wise color-matching post-processing step. Quantitative assessment on 10 stratified test samples uses four complementary metrics: Burn IoU,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
