Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers
Shuochen Wang, Nishant Yadav, Auroop R. Ganguly

TL;DR
Paraformer introduces a Transformer-based model for climate sub-grid scale process parameterization, leveraging a large dataset to better capture complex dependencies and outperform traditional deep learning methods.
Contribution
This work is the first to apply Transformer models with an attenuation mechanism to climate parameterization, utilizing the largest climate dataset to improve accuracy.
Findings
Paraformer outperforms classical deep-learning architectures.
The model effectively captures complex non-linear dependencies.
The study demonstrates the potential of attention mechanisms in climate modeling.
Abstract
One of the major sources of uncertainty in the current generation of Global Climate Models (GCMs) is the representation of sub-grid scale physical processes. Over the years, a series of deep-learning-based parameterization schemes have been developed and tested on both idealized and real-geography GCMs. However, datasets on which previous deep-learning models were trained either contain limited variables or have low spatial-temporal coverage, which can not fully simulate the parameterization process. Additionally, these schemes rely on classical architectures while the latest attention mechanism used in Transformer models remains unexplored in this field. In this paper, we propose Paraformer, a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization. Our results demonstrate that the proposed model successfully captures the complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic Properties and Applications · Neural Networks and Applications · Scientific Research and Discoveries
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam
