CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer
Tao Han, Zhenghao Chen, Song Guo, Wanghan Xu, Lei Bai

TL;DR
This paper introduces VAEformer, a neural codec that compresses large climate datasets like ERA5 by over 300 times, enabling portable AI-based weather research without significant loss of forecasting accuracy.
Contribution
The paper presents a novel low-complexity variational autoencoder transformer for extreme climate data compression, outperforming existing methods and making large datasets more accessible.
Findings
CRA5 dataset is compressed from 226 TB to 0.7 TB.
Forecasting models trained on CRA5 achieve accuracy comparable to those trained on original data.
VAEformer outperforms state-of-the-art compression methods in climate data compression.
Abstract
The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI-based meteorological research. To mitigate this issue, we introduce an efficient neural codec, the Variational Autoencoder Transformer (VAEformer), for extreme compression of climate data to significantly reduce data storage cost, making AI-based meteorological research portable to researchers. Our approach diverges from recent complex neural codecs by utilizing a low-complexity Auto-Encoder transformer. This encoder produces a quantized latent representation through variance inference, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeophysics and Gravity Measurements
MethodsAttention Is All You Need · Dense Connections · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer
