Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model
Denys Godwin, Hanxi Li, Michael Cecil, and Hamed Alemohammad

TL;DR
This paper compares Vision Transformer and CGAN models for imputing cloud-covered pixels in multispectral satellite imagery, aiming to improve data quality for time series analysis.
Contribution
It introduces a novel comparison between ViT and CGAN models for cloud gap imputation in satellite imagery, highlighting the effectiveness of pretrained ViT models.
Findings
ViT outperforms CGAN in imputation accuracy
Pretrained ViT achieves better structural similarity scores
Models effectively reconstruct missing cloud-covered pixels
Abstract
Filling cloudy pixels in multispectral satellite imagery is essential for accurate data analysis and downstream applications, especially for tasks which require time series data. To address this issue, we compare the performance of a foundational Vision Transformer (ViT) model with a baseline Conditional Generative Adversarial Network (CGAN) model for missing value imputation in time series of multispectral satellite imagery. We randomly mask time series of satellite images using real-world cloud masks and train each model to reconstruct the missing pixels. The ViT model is fine-tuned from a pretrained model, while the CGAN is trained from scratch. Using quantitative evaluation metrics such as structural similarity index and mean absolute error as well as qualitative visual analysis, we assess imputation accuracy and contextual preservation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrological Forecasting Using AI · Data Management and Algorithms · Traffic Prediction and Management Techniques
MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Linear Layer · Dense Connections
