# A large-scale dataset for training deep learning segmentation and tracking of extreme weather

**Authors:** Sol Kim, Andre Graubner, Lukas Kapp-Schwoerer, Karthik Kashinath, Konrad Schindler

PMC · DOI: 10.1038/s41597-025-05480-0 · 2025-07-05

## TL;DR

This paper introduces a large dataset of expert-annotated extreme weather events to improve deep learning models for tracking and analyzing such events.

## Contribution

The paper presents the largest dataset of expert-guided, hand-labeled segmentation masks for extreme weather events.

## Key findings

- The dataset includes global annotations for atmospheric rivers, tropical cyclones, and atmospheric blocking events.
- The dataset contains 49,184 labeled timesteps annotated by two separate annotators per event.
- The annotations show characteristics similar to those generated by domain experts.

## Abstract

As Earth’s climate continues to undergo changes, it is imperative to gain understanding of how high-impact, extreme weather events will change. Researchers are increasingly relying on data-driven, learning-based approaches for the detection and tracking of extreme weather events. While several attempts to generate datasets of hand-labeled weather or climate have been made, a significant challenge has been to gather a sufficient number of expert-annotated samples. To address this challenge, we introduce the largest dataset of expert-guided, hand-labeled segmentation masks of extreme weather events. It contains global annotations for atmospheric rivers, tropical cyclones, and atmospheric blocking events from the European Centre for Medium-Range Weather Forecasting’s reanalysis version 5. Every timestep for each event is annotated by two separate annotators to bring the total number of labeled timesteps to 49,184. Professional annotators were trained and guided to identify these features by domain-experts, and event-specific experts were consulted for each of the annotation guides. The resulting annotations are demonstrated to have characteristics similar to other methods and those generated directly by domain experts.

## Full-text entities

- **Genes:** RIEG2 (Rieger syndrome 2) [NCBI Gene 6012] {aka ARS, RGS2}
- **Diseases:** deaths (MESH:D003643), flood (MESH:C565009), geopotential height anomaly (MESH:C000719188), TC (OMIM:275350), TCs (MESH:D004802), AR (MESH:D013734)
- **Chemicals:** water (MESH:D014867), TC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12228819/full.md

---
Source: https://tomesphere.com/paper/PMC12228819