Mechanistic Interpretability Tool for AI Weather Models
Kirsten I. Tempest, Matthias Beylich, George C. Craig

TL;DR
This paper introduces an open-source interpretability tool for AI weather models, enabling analysis of internal representations to understand how predictions are generated.
Contribution
It adapts mechanistic interpretability concepts to weather models, providing initial analysis methods like cosine similarity and PCA for latent space exploration.
Findings
Identified latent directions associated with meteorological features.
Applied the tool to GraphCast, revealing interpretable features.
Demonstrated the tool's potential for understanding AI weather predictions.
Abstract
Artificial Intelligence (AI) weather models are improving rapidly, and their forecasts are already competitive with long-established traditional Numerical Weather Prediction (NWP). To build confidence in this new methodology, it is critical that we understand how these predictions are generated. This is a huge challenge as these AI weather models remain largely black boxes. In other areas of Machine Learning (ML), mechanistic interpretability has emerged as a framework for understanding ML predictions by analysing the building blocks responsible for them. Here we present an open-source, highly adaptable tool which incorporates concepts from mechanistic interpretability. The tool organises internal latent representations from the model processor and allows for initial analyses, including cosine similarity and Principal Component Analysis (PCA), enabling the user to identify directions in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
