Temporal Cross-Media Retrieval with Soft-Smoothing
David Semedo, Jo\~ao Magalh\~aes

TL;DR
This paper introduces a novel neural architecture for cross-media retrieval that explicitly models temporal correlations across visual and textual modalities, improving retrieval performance on time-sensitive multimedia data.
Contribution
It proposes a temporal subspace learning approach with soft constraints, explicitly incorporating time into cross-media retrieval models, which is a departure from standard methods.
Findings
Outperforms baseline models on three datasets
Highlights importance of temporal information in cross-media retrieval
Demonstrates effectiveness of temporal subspace learning
Abstract
Multimedia information have strong temporal correlations that shape the way modalities co-occur over time. In this paper we study the dynamic nature of multimedia and social-media information, where the temporal dimension emerges as a strong source of evidence for learning the temporal correlations across visual and textual modalities. So far, cross-media retrieval models, explored the correlations between different modalities (e.g. text and image) to learn a common subspace, in which semantically similar instances lie in the same neighbourhood. Building on such knowledge, we propose a novel temporal cross-media neural architecture, that departs from standard cross-media methods, by explicitly accounting for the temporal dimension through temporal subspace learning. The model is softly-constrained with temporal and inter-modality constraints that guide the new subspace learning task by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
