LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote   Sensing Vision-Language Interpretation

Zhenshi Li; Dilxat Muhtar; Feng Gu; Xueliang Zhang; Pengfeng Xiao,; Guangjun He; Xiaoxiang Zhu

arXiv:2411.09301·cs.CV·November 15, 2024

LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation

Zhenshi Li, Dilxat Muhtar, Feng Gu, Xueliang Zhang, Pengfeng Xiao,, Guangjun He, Xiaoxiang Zhu

PDF

Open Access 1 Repo

TL;DR

LHRS-Bot-Nova is a specialized multimodal large language model for remote sensing that integrates enhanced vision encoding, a novel bridge layer, and large-scale datasets to improve understanding and interpretation of Earth's surface imagery.

Contribution

The paper introduces LHRS-Bot-Nova with an improved vision encoder, a novel bridge layer, and new datasets for better remote sensing image understanding and instruction following.

Findings

01

Superior performance on remote sensing tasks

02

Effective spatial recognition and instruction following

03

Reliable benchmark results for model comparison

Abstract

Automatically and rapidly understanding Earth's surface is fundamental to our grasp of the living environment and informed decision-making. This underscores the need for a unified system with comprehensive capabilities in analyzing Earth's surface to address a wide range of human needs. The emergence of multimodal large language models (MLLMs) has great potential in boosting the efficiency and convenience of intelligent Earth observation. These models can engage in human-like conversations, serve as unified platforms for understanding images, follow diverse instructions, and provide insightful feedbacks. In this study, we introduce LHRS-Bot-Nova, an MLLM specialized in understanding remote sensing (RS) images, designed to expertly perform a wide range of RS understanding tasks aligned with human instructions. LHRS-Bot-Nova features an enhanced vision encoder and a novel bridge layer,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NJU-LHRS/LHRS-Bot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Text and Document Classification Technologies