HEAL-ViT: Vision Transformers on a spherical mesh for medium-range weather forecasting
Vivek Ramavajjala

TL;DR
HEAL-ViT introduces a novel vision transformer architecture on spherical meshes for weather forecasting, combining the advantages of graph-based and transformer models to improve accuracy and efficiency.
Contribution
The paper presents HEAL-ViT, a new model that applies Vision Transformers on spherical meshes, addressing grid distortion issues and improving weather forecast accuracy.
Findings
Outperforms ECMWF IFS on key metrics
Demonstrates better bias accumulation and blurring
Reduces computational footprint for operational use
Abstract
In recent years, a variety of ML architectures and techniques have seen success in producing skillful medium range weather forecasts. In particular, Vision Transformer (ViT)-based models (e.g. Pangu-Weather, FuXi) have shown strong performance, working nearly "out-of-the-box" by treating weather data as a multi-channel image on a rectilinear grid. While a rectilinear grid is appropriate for 2D images, weather data is inherently spherical and thus heavily distorted at the poles on a rectilinear grid, leading to disproportionate compute being used to model data near the poles. Graph-based methods (e.g. GraphCast) do not suffer from this problem, as they map the longitude-latitude grid to a spherical mesh, but are generally more memory intensive and tend to need more compute resources for training and inference. While spatially homogeneous, the spherical mesh does not lend itself readily…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSatellite Image Processing and Photogrammetry · Urban Heat Island Mitigation · 3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout
