Vision Transformers for Multi-Variable Climate Downscaling: Emulating Regional Climate Models with a Shared Encoder and Multi-Decoder Architecture

Fabio Merizzi; Harilaos Loukos

arXiv:2506.22447·cs.LG·February 17, 2026

Vision Transformers for Multi-Variable Climate Downscaling: Emulating Regional Climate Models with a Shared Encoder and Multi-Decoder Architecture

Fabio Merizzi, Harilaos Loukos

PDF

TL;DR

This paper introduces a multi-variable Vision Transformer architecture for regional climate downscaling that improves accuracy and reduces computational costs by jointly modeling six climate variables from GCM data.

Contribution

The paper presents a novel multi-variable ViT with shared encoder and variable-specific decoders, outperforming single-variable models and other baselines in climate downscaling tasks.

Findings

01

Average MSE reduced by 5.5% compared to single-variable models

02

Achieved 29-32% lower inference time per variable

03

Outperformed alternative multi-variable baselines

Abstract

Global Climate Models (GCMs) are critical for simulating large-scale climate dynamics, but their coarse spatial resolution limits their applicability in regional studies. Regional Climate Models (RCMs) address this limitation through dynamical downscaling, albeit at considerable computational cost and with limited flexibility. Deep learning has emerged as an efficient data-driven alternative; however, most existing approaches focus on single-variable models that downscale one variable at a time. This paradigm can lead to redundant computation, limited contextual awareness, and weak cross-variable interactions.To address these limitations, we propose a multi-variable Vision Transformer (ViT) architecture with a shared encoder and variable-specific decoders (1EMD). The proposed model jointly predicts six key climate variables: surface temperature, wind speed, 500 hPa geopotential height,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.