SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, Ram, Rajagopal

TL;DR
SkyScript is a large, diverse remote sensing vision-language dataset created using geo-coordinates and OpenStreetMap, enabling significant improvements in zero-shot scene classification and other multi-modal remote sensing tasks.
Contribution
The paper introduces SkyScript, a novel large-scale remote sensing dataset linking images with rich semantic tags via geo-coordinates, facilitating the development of versatile vision-language models.
Findings
Achieved 6.2% accuracy improvement in zero-shot scene classification
Demonstrated effective zero-shot transfer for object attribute classification
Enabled cross-modal retrieval and captioning in remote sensing
Abstract
Remote sensing imagery, despite its broad applications in helping achieve Sustainable Development Goals and tackle climate change, has not yet benefited from the recent advancements of versatile, task-agnostic vision language models (VLMs). A key reason is that the large-scale, semantically diverse image-text dataset required for developing VLMs is still absent for remote sensing images. Unlike natural images, remote sensing images and their associated text descriptions cannot be efficiently collected from the public Internet at scale. In this work, we bridge this gap by using geo-coordinates to automatically connect open, unlabeled remote sensing images with rich semantics covered in OpenStreetMap, and thus construct SkyScript, a comprehensive vision-language dataset for remote sensing images, comprising 2.6 million image-text pairs covering 29K distinct semantic tags. With continual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Genomics and Phylogenetic Studies
