Setting the Standard: Recommended Practices for Data Preprocessing in Data-Driven Climate Prediction
Jason C. Furtado, Maria J. Molina, Marybeth C. Arcodia, Weston Anderson, Tom Beucler, John A. Callahan, Laura M. Ciasto, Vittorio A. Gensini, Michelle L'Heureux, Kathleen Pegion, Jhayron S. P\'erez-Carrasquilla, Maike Sonnewald, Ken Takahashi, Baoqiang Xiang

TL;DR
This paper establishes standardized data preprocessing protocols for AI/ML climate prediction models, emphasizing their importance for improving forecast accuracy, robustness, and transparency across various timescales.
Contribution
It provides comprehensive guidelines and case studies on data preprocessing techniques tailored for climate prediction models, addressing a critical gap in current practices.
Findings
Preprocessing significantly affects climate prediction outcomes.
Standardized anomaly creation improves model robustness.
Proper handling of non-stationarity and extremes enhances forecast reliability.
Abstract
Artificial intelligence (AI) - and specifically machine learning (ML) - applications for climate prediction across timescales are proliferating quickly. The emergence of these methods prompts a revisit to the impact of data preprocessing, a topic familiar to the climate community, as more traditional statistical models work with relatively small sample sizes. Indeed, the skill and confidence in the forecasts produced by data-driven models are directly influenced by the quality of the datasets and how they are treated during model development, thus yielding the colloquialism, "garbage in, garbage out." As such, this article establishes protocols for the proper preprocessing of input data for AI/ML models designed for climate prediction (i.e., subseasonal to decadal and longer). The three aims are to: (1) educate researchers, developers, and end users on the effects that preprocessing has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate variability and models · Meteorological Phenomena and Simulations · Hydrological Forecasting Using AI
