Do Cross Modal Systems Leverage Semantic Relationships?
Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood,, Alessandro Calefati, and Faisal Shafait

TL;DR
This paper introduces SemanticMap, a new evaluation measure for cross-modal retrieval that considers semantic similarity, and proposes a novel single-stream neural network system trained with extended center loss for bidirectional image-text retrieval, showing promising results.
Contribution
It presents the first single-stream network for cross-modal retrieval and a semantic-aware evaluation measure, improving generalization to unseen data.
Findings
SemanticMap effectively evaluates semantic similarity in latent space.
The single-stream network achieves comparable results to state-of-the-art methods.
The approach improves generalization to unseen data in cross-modal retrieval.
Abstract
Current cross-modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure, SemanticMap, to evaluate the performance of cross-modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross-modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
