TL;DR
This paper systematically studies semantic localization in remote sensing, proposing new evaluation metrics, creating a benchmark dataset, and analyzing model performance to advance the understanding and application of multi-modal semantic localization.
Contribution
It introduces comprehensive evaluation metrics, a diverse test dataset, and a detailed benchmark for the semantic localization task in remote sensing, filling a significant research gap.
Findings
Proposed new metrics for pixel and region-level evaluation.
Created a large-scale, multi-objective test dataset AIR-SLT.
Analyzed the impact of variables on model performance.
Abstract
Semantic localization (SeLo) refers to the task of obtaining the most relevant locations in large-scale remote sensing (RS) images using semantic information such as text. As an emerging task based on cross-modal retrieval, SeLo achieves semantic-level retrieval with only caption-level annotation, which demonstrates its great potential in unifying downstream tasks. Although SeLo has been carried out successively, but there is currently no work has systematically explores and analyzes this urgent direction. In this paper, we thoroughly study this field and provide a complete benchmark in terms of metrics and testdata to advance the SeLo task. Firstly, based on the characteristics of this task, we propose multiple discriminative evaluation metrics to quantify the performance of the SeLo task. The devised significant area proportion, attention shift distance, and discrete attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
