Scaling Laws of Global Weather Models
Yuejiang Yu, Langwen Huang, Alexandru Calotoiu, Torsten Hoefler

TL;DR
This paper investigates empirical scaling laws in data-driven global weather models, revealing that larger datasets and wider architectures significantly improve model performance, guiding future model design and training strategies.
Contribution
It provides the first comprehensive analysis of how model size, dataset size, and compute influence weather model performance, highlighting optimal resource allocation and architectural preferences.
Findings
Increasing dataset size by 10x reduces validation loss by up to 3.2x.
Wider models outperform deeper ones in weather forecasting tasks.
Allocating more compute to longer training improves performance more than increasing model size.
Abstract
Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size (), dataset size (), and compute budget (). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
