Self-Supervised Cross-Modal Text-Image Time Series Retrieval in Remote Sensing
Genc Hoxha, Oliv\'er Angyal, Beg\"um Demir

TL;DR
This paper introduces a novel self-supervised cross-modal retrieval method in remote sensing that allows retrieving image time series using text queries, addressing the limitation of unimodal retrieval methods.
Contribution
It is the first to propose cross-modal text-image time series retrieval in remote sensing, utilizing modality-specific encoders and fusion strategies for temporal modeling.
Findings
Effective retrieval of semantically relevant bitemporal images and texts
Outperforms existing unimodal retrieval methods on benchmark datasets
Demonstrates the utility of transformer-based fusion for temporal information modeling
Abstract
The development of image time series retrieval (ITSR) methods is a growing research interest in remote sensing (RS). Given a user-defined image time series (i.e., the query time series), ITSR methods search and retrieve from large archives the image time series that have similar content to the query time series. Existing ITSR methods in RS are designed for unimodal retrieval problems, relying on an assumption that users always have access to a query image time series in the considered image modality. In operational scenarios, this assumption may not hold. To overcome this issue, as a first time in RS we introduce the task of cross-modal text-image time series retrieval (text-ITSR). In detail, we present a self-supervised cross-modal text-ITSR method that enables the retrieval of image time series using text sentences as queries, and vice versa. We focus our attention on text-ITSR in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
MethodsSoftmax · Attention Is All You Need · ALIGN · Focus
