Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

Aayush Dhakal; Adeel Ahmad; Subash Khanal; Srikumar Sastry; Hannah; Kerner; Nathan Jacobs

arXiv:2307.15904·cs.CV·April 15, 2024

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

Aayush Dhakal, Adeel Ahmad, Subash Khanal, Srikumar Sastry, Hannah, Kerner, Nathan Jacobs

PDF

Open Access

TL;DR

Sat2Cap introduces a weakly supervised, zero-shot mapping method that uses contrastive learning on a large dataset to generate fine-grained textual descriptions from satellite images, enabling flexible and scalable geographic mapping.

Contribution

This work presents Sat2Cap, a contrastive learning framework trained on 6.1 million image pairs for zero-shot mapping of textual descriptions from satellite images, without requiring text labels.

Findings

01

Successfully captures ground-level concepts from satellite imagery.

02

Enables large-scale, fine-grained textual mapping.

03

Models temporally varying concepts over locations.

Abstract

We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset with 6.1M pairs of overhead and ground-level images. For a given location and overhead image, our model predicts the expected CLIP embeddings of the ground-level scenery. The predicted CLIP embeddings are then used to learn about the textual space associated with that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning · Contrastive Language-Image Pre-training