Physical Consistency of Aurora's Encoder: A Quantitative Study
Benjamin Richards, Pushpa Kumar Balan

TL;DR
This study evaluates Aurora's encoder to determine if its internal representations align with physical weather concepts, revealing strengths and limitations in capturing key meteorological features.
Contribution
It provides a quantitative analysis of Aurora's encoder, demonstrating its learning of physically consistent features and identifying areas for improvement.
Findings
Aurora's encoder learns physically meaningful features.
The model has limitations in capturing rare extreme events.
Interpretability methods are essential for trust in AI weather models.
Abstract
The high accuracy of large-scale weather forecasting models like Aurora is often accompanied by a lack of transparency, as their internal representations remain largely opaque. This "black box" nature hinders their adoption in high-stakes operational settings. In this work, we probe the physical consistency of Aurora's encoder by investigating whether its latent representations align with known physical and meteorological concepts. Using a large-scale dataset of embeddings, we train linear classifiers to identify three distinct concepts: the fundamental land-sea boundary, high-impact extreme temperature events, and atmospheric instability. Our findings provide quantitative evidence that Aurora learns physically consistent features, while also highlighting its limitations in capturing the rarest events. This work underscores the critical need for interpretability methods to validate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations · Tropical and Extratropical Cyclones Research · Climate variability and models
