Towards Scaling Law Analysis For Spatiotemporal Weather Data
Alexander Kiefer, Prasanna Balaprakash, Xiao Wang

TL;DR
This paper extends neural scaling laws to long-horizon, multi-channel weather forecasting, revealing heterogeneity in error growth and implications for model training and resource allocation.
Contribution
It introduces a framework for analyzing scaling laws in autoregressive weather models across multiple channels and forecast horizons, highlighting complex error behaviors.
Findings
Prediction error varies significantly across channels and horizons.
Power law scaling may not hold uniformly across all channels and forecast lengths.
Heterogeneous error growth impacts training strategies and resource distribution.
Abstract
Compute-optimal scaling laws are relatively well studied for NLP and CV, where objectives are typically single-step and targets are comparatively homogeneous. Weather forecasting is harder to characterize in the same framework: autoregressive rollouts compound errors over long horizons, outputs couple many physical channels with disparate scales and predictability, and globally pooled test metrics can disagree sharply with per-channel, late-lead behavior implied by short-horizon training. We extend neural scaling analysis for autoregressive weather forecasting from single-step training loss to long rollouts and per-channel metrics. We quantify (1) how prediction error is distributed across channels and how its growth rate evolves with forecast horizon, (2) if power law scaling holds for test error, relative to rollout length when error is pooled globally, and (3) how that fit varies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
