Cross-media Similarity Metric Learning with Unified Deep Networks

Jinwei Qi; Xin Huang; and Yuxin Peng

arXiv:1704.04333·cs.MM·April 17, 2017·1 cites

Cross-media Similarity Metric Learning with Unified Deep Networks

Jinwei Qi, Xin Huang, and Yuxin Peng

PDF

Open Access

TL;DR

This paper introduces UNCSM, a unified deep network framework that jointly learns shared representations and a similarity metric for cross-media retrieval, significantly improving accuracy over existing methods.

Contribution

The paper proposes a novel unified deep network that combines shared representation learning with a learned similarity metric for cross-media retrieval.

Findings

01

Outperforms 8 state-of-the-art methods on 4 datasets

02

Effectively models both similar and dissimilar constraints

03

Unifies representation and metric learning for better retrieval accuracy

Abstract

As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications