ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models
Ehsan Zeraatkar, Salah Faroughi, Jelena Te\v{s}i\'c

TL;DR
This paper introduces ViSIR, a novel deep learning model combining Vision Transformers and SIREN to enhance single image reconstruction in Earth system models, significantly outperforming existing methods in accuracy.
Contribution
The paper presents ViSIR, a new hybrid neural network architecture that improves spectral detail preservation and reconstruction quality in Earth system model data.
Findings
ViSIR outperforms SRCNN, ViT, SIREN, and SRGANs in PSNR.
ViSIR achieves higher MSE, PSNR, and SSIM scores.
The method effectively preserves high-frequency details in reconstructed images.
Abstract
Purpose: Earth system models (ESMs) integrate the interactions of the atmosphere, ocean, land, ice, and biosphere to estimate the state of regional and global climate under a wide variety of conditions. The ESMs are highly complex; thus, deep neural network architectures are used to model the complexity and store the down-sampled data. This paper proposes the Vision Transformer Sinusoidal Representation Networks (ViSIR) to improve the ESM data's single image SR (SR) reconstruction task. Methods: ViSIR combines the SR capability of Vision Transformers (ViT) with the high-frequency detail preservation of the Sinusoidal Representation Network (SIREN) to address the spectral bias observed in SR tasks. Results: The ViSIR outperforms SRCNN by 2.16 db, ViT by 6.29 dB, SIREN by 8.34 dB, and SR-Generative Adversarial (SRGANs) by 7.93 dB PSNR on average for three different measurements.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeological Modeling and Analysis · Satellite Image Processing and Photogrammetry
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Vision Transformer
