LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum, Wen-Han Hsieh, Tsung-Han Wu, Ritwik Gupta, Trevor, Darrell, David M. Chan

TL;DR
LISAt is a vision-language model designed for complex remote-sensing scene understanding, capable of describing, questioning, and segmenting objects in satellite imagery, trained on a new geospatial dataset and outperforming existing models.
Contribution
Introduces LISAt, a novel vision-language model for remote sensing, trained on GRES and PreGRES datasets, with significant performance improvements over existing models.
Findings
LISAt outperforms RS-GPT4V by over 10% BLEU-4 on description tasks.
LISAt surpasses state-of-the-art models by 143% gIoU on segmentation.
The model and datasets are publicly available.
Abstract
Segmentation models can recognize a pre-defined set of objects in images. However, models that can reason over complex user queries that implicitly refer to multiple objects of interest are still in their infancy. Recent advances in reasoning segmentation--generating segmentation masks from complex, implicit query text--demonstrate that vision-language models can operate across an open domain and produce reasonable outputs. However, our experiments show that such models struggle with complex remote-sensing imagery. In this work, we introduce LISAt, a vision-language model designed to describe complex remote-sensing scenes, answer questions about them, and segment objects of interest. We trained LISAt on a new curated geospatial reasoning-segmentation dataset, GRES, with 27,615 annotations over 9,205 images, and a multimodal pretraining dataset, PreGRES, containing over 1 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeological Modeling and Analysis · Geographic Information Systems Studies · Methane Hydrates and Related Phenomena
MethodsSparse Evolutionary Training
