Cross-media Similarity Metric Learning with Unified Deep Networks
Jinwei Qi, Xin Huang, and Yuxin Peng

TL;DR
This paper introduces UNCSM, a unified deep network framework that jointly learns shared representations and a similarity metric for cross-media retrieval, significantly improving accuracy over existing methods.
Contribution
The paper proposes a novel unified deep network that combines shared representation learning with a learned similarity metric for cross-media retrieval.
Findings
Outperforms 8 state-of-the-art methods on 4 datasets
Effectively models both similar and dissimilar constraints
Unifies representation and metric learning for better retrieval accuracy
Abstract
As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
