Towards Knowledge Guided Pretraining Approaches for Multimodal Foundation Models: Applications in Remote Sensing
Praveen Ravirathinam, Ajitesh Parthasarathy, Ankush Khandelwal, Rahul Ghosh, Vipin Kumar

TL;DR
This paper introduces KG-VSF, a knowledge-guided pretraining method for multimodal models in remote sensing that captures causal relationships, improving downstream task performance.
Contribution
It proposes a novel pretraining task that models forecasting as conditional generation, integrating causal knowledge into foundation models for remote sensing.
Findings
Pretraining with KG-VSF enhances embeddings for downstream tasks.
Improved performance in crop mapping, soil moisture estimation, and image forecasting.
Outperforms standard pretraining approaches in causality-sensitive tasks.
Abstract
Self-supervised learning has emerged as a powerful paradigm for pretraining foundation models using large-scale data. Existing pretraining approaches predominantly rely on masked reconstruction or next-token prediction strategies, demonstrating strong performance across various downstream tasks, including geoscience applications. However, these approaches do not fully capture the knowledge of causal interplay between different geospatial and environmental variables. To address this limitation, we propose Knowledge Guided Variable-Step Forecasting (KG-VSF), a novel pretraining task that models forecasting as a conditional generation task, where driver variables (e.g., weather) inform the prediction of response variables (e.g., satellite imagery). We demonstrate that pretraining in such a fashion leads to strong embeddings which give enhanced performance when finetuned on downstream tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
