Cumulo: A Dataset for Learning Cloud Classes
Valentina Zantedeschi, Fabrizio Falasca, Alyson Douglas, Richard, Strange, Matt J. Kusner, Duncan Watson-Parris

TL;DR
Cumulo is a new high-resolution cloud dataset combining MODIS imagery with CloudSat labels, enabling machine learning models to improve cloud classification for climate research.
Contribution
The paper introduces Cumulo, a benchmark dataset that merges hyperspectral imagery with cloud labels, facilitating advanced cloud classification techniques.
Findings
Baseline performance achieved with IResNet model.
Discovery of new sub-classes within cloud categories.
Evaluation criteria for accuracy and physical realism.
Abstract
One of the greatest sources of uncertainty in future climate projections comes from limitations in modelling clouds and in understanding how different cloud types interact with the climate system. A key first step in reducing this uncertainty is to accurately classify cloud types at high spatial and temporal resolution. In this paper, we introduce Cumulo, a benchmark dataset for training and evaluating global cloud classification models. It consists of one year of 1km resolution MODIS hyperspectral imagery merged with pixel-width 'tracks' of CloudSat cloud labels. Bringing these complementary datasets together is a crucial first step, enabling the Machine-Learning community to develop innovative new techniques which could greatly benefit the Climate community. To showcase Cumulo, we provide baseline performance analysis using an invertible flow generative model (IResNet), which further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSolar Radiation and Photovoltaics · Remote Sensing in Agriculture · Flood Risk Assessment and Management
