Towards Vision-Language Geo-Foundation Model: A Survey

Yue Zhou; Zhihang Zhong; Xue Yang

arXiv:2406.09385·cs.CV·January 6, 2026·6 cites

Towards Vision-Language Geo-Foundation Model: A Survey

Yue Zhou, Zhihang Zhong, Xue Yang

PDF

Open Access 1 Repo

TL;DR

This survey reviews the development of Vision-Language Geo-Foundation Models (VLGFMs), emphasizing their unique geospatial data integration, core technologies, applications, and future research directions in the field.

Contribution

It is the first comprehensive review of VLGFMs, systematically summarizing recent advances, core methodologies, and discussing future challenges in geospatial multimodal modeling.

Findings

01

VLGFMs leverage large-scale geospatial multimodal data.

02

Core technologies include specialized data construction and model architectures.

03

VLGFMs show promising applications in earth observation tasks.

Abstract

Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs fine-tuned on them have been proposed recently. These new approaches aim to leverage large-scale, multimodal geospatial data to build versatile intelligent models with diverse geo-perceptive capabilities, which we refer to as Vision-Language Geo-Foundation Models (VLGFMs). This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field. In particular, we introduce the background and motivation behind the rise of VLGFMs, highlighting their unique research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zytx121/awesome-vlgfm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies