GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja; Muhammad Sohail Danish; Muzammal Naseer; Abhijit Das,; Salman Khan; Fahad Shahbaz Khan

arXiv:2311.15826·cs.CV·November 28, 2023·6 cites

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das,, Salman Khan, Fahad Shahbaz Khan

PDF

Open Access 1 Repo 1 Models 2 Datasets

TL;DR

GeoChat is a novel grounded large vision-language model specifically designed for remote sensing, enabling multitask conversations, region-specific dialogue, and object grounding in high-resolution RS images, addressing domain-specific challenges.

Contribution

It introduces the first versatile remote sensing VLM with multitask conversational abilities and a new RS multimodal instruction-following dataset, along with a comprehensive benchmark.

Findings

01

Robust zero-shot performance on RS tasks

02

Effective region-specific dialogue and object grounding

03

Outperforms baseline methods in RS multimodal understanding

Abstract

Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene interpretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries. To address these limitations, we propose GeoChat - the first versatile remote…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbzuai-oryx/geochat
pytorchOfficial

Models

🤗
MBZUAI/geochat-7B
model· 2.9k dl· ♡ 24
2.9k dl♡ 24

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsALIGN