On the Opportunities and Challenges of Foundation Models for Geospatial   Artificial Intelligence

Gengchen Mai; Weiming Huang; Jin Sun; Suhang Song; Deepak Mishra,; Ninghao Liu; Song Gao; Tianming Liu; Gao Cong; Yingjie Hu; Chris Cundy,; Ziyuan Li; Rui Zhu; Ni Lao

arXiv:2304.06798·cs.AI·April 17, 2023·66 cites

On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence

Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra,, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy,, Ziyuan Li, Rui Zhu, Ni Lao

PDF

Open Access

TL;DR

This paper explores the potential and challenges of developing multimodal foundation models for geospatial AI, evaluating existing models across diverse tasks and emphasizing the importance of multimodal reasoning.

Contribution

It provides an empirical assessment of foundation models on geospatial tasks and discusses the challenges of multimodality in GeoAI, proposing directions for future multimodal foundation models.

Findings

01

LLMs outperform task-specific models on text-only geospatial tasks in zero-shot/few-shot settings.

02

Foundation models underperform on multimodal geospatial tasks compared to specialized models.

03

Addressing multimodality is crucial for developing effective foundation models for GeoAI.

Abstract

Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData-Driven Disease Surveillance · Human Mobility and Location-Based Analysis