Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery
Md Adnan Arefeen, Biplob Debnath, Ravi K. Rajendran, Murugan Sankaradas, Srimat T. Chakradhar

TL;DR
Open-SAT introduces a training-free, inference-time algorithm that refines query embeddings using LLMs and VLMs to improve open-vocabulary satellite image retrieval accuracy.
Contribution
It presents a novel inference-time embedding refinement method leveraging LLMs and VLMs, eliminating the need for additional training.
Findings
Improves F1 score by up to 16.04% in satellite image retrieval.
Operates without additional training or supervision.
Enhances retrieval accuracy and efficiency in open-vocabulary scenarios.
Abstract
In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval system must generalize to a wide range of unseen objects and concepts. While vision-language models (VLMs) such as CLIP are widely used for text-image retrieval, even fine-tuned variants often struggle to accurately align such queries with satellite imagery. To address this, we propose Open-SAT, a training-free query embedding refinement algorithm that operates at inference time to improve alignment between user queries and satellite image content. Open-SAT uses VLMs to compute embeddings for image tiles, which are stored in a vector database for efficient retrieval. At query time, it leverages Large Language Models (LLMs) to refine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
