Multilingual Training-Free Remote Sensing Image Captioning
Carlos Rebelo, Gil Rocha, Jo\~ao Daniel Silva, Bruno Martins

TL;DR
This paper introduces a training-free, multilingual remote sensing image captioning method using retrieval-augmented prompting, which performs competitively across languages without relying on large annotated datasets.
Contribution
It presents the first training-free multilingual approach employing retrieval-augmented prompting and graph-based re-ranking for remote sensing image captioning.
Findings
Competitive performance with supervised English systems
Re-ranking with PageRank improves results by up to 35%
Direct caption generation in target languages outperforms translation-based methods
Abstract
Remote sensing image captioning has advanced rapidly through encoder--decoder models, although the reliance on large annotated datasets and the focus on English restricts global applicability. To address these limitations, we propose the first training-free multilingual approach, based on retrieval-augmented prompting. For a given aerial image, we employ a domain-adapted SigLIP2 encoder to retrieve related captions and few-shot examples from a datastore, which are then provided to a language model. We explore two variants: an image-blind setup, where a multilingual Large Language Model (LLM) generates the caption from textual prompts alone, and an image-aware setup, where a Vision--Language Model (VLM) jointly processes the prompt and the input image. To improve the coherence of the retrieved content, we introduce a graph-based re-ranking strategy using PageRank on a graph of images and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
