Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction
Jared D. Willard, Peter Harrington, Shashank Subramanian, Ankur, Mahesh, Travis A. O'Brien, William D. Collins

TL;DR
This paper demonstrates that high-quality weather forecasts can be achieved using simple, off-the-shelf transformer architectures with straightforward training procedures, supported by ablation studies and diverse evaluation metrics.
Contribution
It shows that effective large-scale weather prediction is possible with minimal modifications and moderate resources, providing insights into training components and model scaling.
Findings
Transformer models can outperform traditional NWP models.
Simple training setups can achieve high forecast skill.
Model performance improves with increased size and depth.
Abstract
The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to discern which components are most critical to success. In this work, we show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS. We present some ablations on key aspects of the training pipeline, exploring different loss functions, model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Load and Power Forecasting · Advanced Computational Techniques and Applications · Thermal Analysis in Power Transmission
