# A Benchmark Dataset for Satellite-Based Estimation and Detection of Rain

**Authors:** Simon Pfreundschuh, Malarvizhi Arulraj, Ali Behrangi, Linda Bogerd, Alan James Peixoto Calheiros, Daniele Casella, Neda Dolatabadi, Clement Guilloteau, Jie Gong, Christian D. Kummerow, Pierre Kirstetter, Gyuwon Lee, Maximilian Maahn, Lisa Milani, Giulia Panegrossi, Rayana Palharini, Veljko Petković, Soorok Ryu, Paolo Sanó, Jackson Tan

PMC · DOI: 10.1038/s41597-026-06565-0 · Scientific Data · 2026-01-15

## TL;DR

This paper introduces SatRain, a new benchmark dataset for improving satellite-based rain detection and estimation using machine learning.

## Contribution

The novel contribution is the creation of SatRain, the first standardized AI benchmark dataset for satellite-based precipitation retrieval.

## Key findings

- SatRain integrates multi-sensor satellite data with high-quality ground-based radar precipitation estimates.
- The dataset includes out-of-distribution testing data from Asia and Europe for robust comparisons.
- SatRain supports the development of next-generation AI models for global precipitation monitoring.

## Abstract

Accurately tracking the global distribution of precipitation is essential for both research and operational meteorology. Satellite observations remain the only means of achieving consistent, global precipitation monitoring. While machine learning has long been applied to satellite-based precipitation retrieval, the absence of a standardized benchmark dataset has hindered fair comparisons between methods. To address this, the International Precipitation Working Group has developed SatRain, the first AI benchmark dataset for satellite-based detection and estimation of rain. SatRain integrates multi-sensor satellite observations from the primary platforms used in precipitation remote sensing with high-quality reference precipitation estimates derived from gauge-corrected ground-based radar composites over the conterminous United States. It offers a standardized evaluation protocol and out-of-distribution testing data from Asia and Europe to enable robust and reproducible comparisons across machine learning approaches. In addition to algorithm evaluation, the diversity of sensors and inclusion of time-resolved geostationary observations make SatRain a valuable foundation for developing next-generation AI models to deliver more accurate global precipitation estimates.

## Full-text entities

- **Diseases:** AHI (MESH:C564543)
- **Chemicals:** ice (MESH:D007053), water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12909819/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12909819/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12909819/full.md

---
Source: https://tomesphere.com/paper/PMC12909819