ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models

Ehsan Zeraatkar; Salah Faroughi; Jelena Te\v{s}i\'c

arXiv:2502.06741·cs.CV·May 27, 2025

ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models

Ehsan Zeraatkar, Salah Faroughi, Jelena Te\v{s}i\'c

PDF

Open Access

TL;DR

This paper introduces ViSIR, a novel deep learning model combining Vision Transformers and SIREN to enhance single image reconstruction in Earth system models, significantly outperforming existing methods in accuracy.

Contribution

The paper presents ViSIR, a new hybrid neural network architecture that improves spectral detail preservation and reconstruction quality in Earth system model data.

Findings

01

ViSIR outperforms SRCNN, ViT, SIREN, and SRGANs in PSNR.

02

ViSIR achieves higher MSE, PSNR, and SSIM scores.

03

The method effectively preserves high-frequency details in reconstructed images.

Abstract

Purpose: Earth system models (ESMs) integrate the interactions of the atmosphere, ocean, land, ice, and biosphere to estimate the state of regional and global climate under a wide variety of conditions. The ESMs are highly complex; thus, deep neural network architectures are used to model the complexity and store the down-sampled data. This paper proposes the Vision Transformer Sinusoidal Representation Networks (ViSIR) to improve the ESM data's single image SR (SR) reconstruction task. Methods: ViSIR combines the SR capability of Vision Transformers (ViT) with the high-frequency detail preservation of the Sinusoidal Representation Network (SIREN) to address the spectral bias observed in SR tasks. Results: The ViSIR outperforms SRCNN by 2.16 db, ViT by 6.29 dB, SIREN by 8.34 dB, and SR-Generative Adversarial (SRGANs) by 7.93 dB PSNR on average for three different measurements.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeological Modeling and Analysis · Satellite Image Processing and Photogrammetry

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Vision Transformer