GeoMeld: Toward Semantically Grounded Foundation Models for Remote Sensing

Maram Hasan; Md Aminur Hossain; Savitra Roy; Souparna Bhowmik; Ayush V. Patel; Mainak Singha; Subhasis Chaudhuri; Muhammad Haris Khan; Biplab Banerjee

arXiv:2604.10591·cs.CV·April 14, 2026

GeoMeld: Toward Semantically Grounded Foundation Models for Remote Sensing

Maram Hasan, Md Aminur Hossain, Savitra Roy, Souparna Bhowmik, Ayush V. Patel, Mainak Singha, Subhasis Chaudhuri, Muhammad Haris Khan, Biplab Banerjee

PDF

1 Datasets

TL;DR

GeoMeld introduces a large-scale, multimodal remote sensing dataset with semantically grounded supervision and a pretraining framework that enhances cross-sensor robustness and semantic understanding.

Contribution

The paper presents GeoMeld, a novel multimodal dataset, and GeoMeld-FM, a pretraining framework, advancing semantically grounded foundation models in remote sensing.

Findings

01

GeoMeld improves downstream transfer performance.

02

GeoMeld-FM enhances cross-sensor robustness.

03

Joint training captures physical consistency and semantics.

Abstract

Effective foundation modeling in remote sensing requires spatially aligned heterogeneous modalities coupled with semantically grounded supervision, yet such resources remain limited at scale. We present GeoMeld, a large-scale multimodal dataset with approximately 2.5 million spatially aligned samples. The dataset spans diverse modalities and resolutions and is constructed under a unified alignment protocol for modality-aware representation learning. GeoMeld provides semantically grounded language supervision through an agentic captioning framework that synthesizes and verifies annotations from spectral signals, terrain statistics, and structured geographic metadata, encoding measurable cross-modality relationships within textual descriptions. To leverage this dataset, we introduce GeoMeld-FM, a pretraining framework that combines multi-pretext masked autoencoding over aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

vimageiitb/GeoMeld
dataset· 1.7k dl
1.7k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.