Paraformer: Parameterization of Sub-grid Scale Processes Using   Transformers

Shuochen Wang; Nishant Yadav; Auroop R. Ganguly

arXiv:2412.16763·cs.LG·December 24, 2024

Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers

Shuochen Wang, Nishant Yadav, Auroop R. Ganguly

PDF

Open Access

TL;DR

Paraformer introduces a Transformer-based model for climate sub-grid scale process parameterization, leveraging a large dataset to better capture complex dependencies and outperform traditional deep learning methods.

Contribution

This work is the first to apply Transformer models with an attenuation mechanism to climate parameterization, utilizing the largest climate dataset to improve accuracy.

Findings

01

Paraformer outperforms classical deep-learning architectures.

02

The model effectively captures complex non-linear dependencies.

03

The study demonstrates the potential of attention mechanisms in climate modeling.

Abstract

One of the major sources of uncertainty in the current generation of Global Climate Models (GCMs) is the representation of sub-grid scale physical processes. Over the years, a series of deep-learning-based parameterization schemes have been developed and tested on both idealized and real-geography GCMs. However, datasets on which previous deep-learning models were trained either contain limited variables or have low spatial-temporal coverage, which can not fully simulate the parameterization process. Additionally, these schemes rely on classical architectures while the latest attention mechanism used in Transformer models remains unexplored in this field. In this paper, we propose Paraformer, a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization. Our results demonstrate that the proposed model successfully captures the complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic Properties and Applications · Neural Networks and Applications · Scientific Research and Discoveries

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam