GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras, Dimitrios Michail, Xiao Xiang Zhu, Beg\"um Demir, Ioannis Papoutsis

TL;DR
GAIA is a comprehensive, scientifically curated multi-modal dataset for remote sensing that enhances vision-language models' performance on RS-specific tasks by providing diverse, high-quality image-text pairs across various modalities and applications.
Contribution
The paper introduces GAIA, a large-scale, multi-modal remote sensing dataset with scientifically grounded captions, addressing the domain gap in existing vision-language models.
Findings
GAIA improves RS image classification accuracy.
GAIA enhances cross-modal retrieval performance.
GAIA benefits RS image captioning tasks.
Abstract
Existing Vision-Language Models (VLMs) are predominantly trained on web-scraped, noisy image-text data, exhibiting limited exposure to the specialized domain of RS. This deficiency results in poor performance on RS-specific tasks, as commonly used datasets often lack detailed, scientifically accurate textual descriptions and instead emphasize solely on attributes like date and location. To bridge this critical gap, we introduce GAIA, a novel dataset designed for multi-scale, multi-sensor, and multi-modal RS image analysis. GAIA comprises of 201,005 meticulously curated RS image-text pairs, representing a diverse range of RS modalities associated to different spatial resolutions. Unlike existing vision-language datasets in RS, GAIA specifically focuses on capturing a diverse range of RS applications, providing unique information about environmental changes, natural disasters, and various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Genomics and Phylogenetic Studies · Remote-Sensing Image Classification
MethodsContrastive Language-Image Pre-training
