Towards Automatic Satellite Images Captions Generation Using Large Language Models
Yingxu He, Qiqi Sun

TL;DR
This paper introduces ARSIC, a novel method that leverages large language models to automatically generate captions for satellite images, addressing dataset scarcity and improving caption quality for remote sensing applications.
Contribution
It proposes a new approach to automatically generate captions for satellite images using LLMs and adapts existing models for high-quality remote sensing image captioning.
Findings
Effective automatic caption collection for remote sensing images
Improved caption quality over conventional models
Demonstrated potential for large-scale dataset creation
Abstract
Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain is the lack of large-scale image-caption datasets, as they require a lot of human expertise and effort to create. Recent research on large language models (LLMs) has demonstrated their impressive performance in natural language understanding and generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5, Falcon, Claude, etc.), while conventional captioning models pre-trained on general ground-view images often fail to produce detailed and accurate captions for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we propose a novel approach: Automatic Remote…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
