CCL: Cross-modal Correlation Learning with Multi-grained Fusion by   Hierarchical Network

Yuxin Peng; Jinwei Qi; Xin Huang; Yuxin Yuan

arXiv:1704.02116·cs.MM·August 9, 2017·5 cites

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network

Yuxin Peng, Jinwei Qi, Xin Huang, Yuxin Yuan

PDF

Open Access

TL;DR

This paper introduces CCL, a hierarchical network that enhances cross-modal retrieval by modeling multi-grained, intra- and inter-modality correlations with a multi-stage, multi-task learning framework, outperforming existing methods.

Contribution

The paper proposes a novel hierarchical network with multi-grained fusion and multi-level association, addressing limitations of existing cross-modal retrieval methods.

Findings

01

Achieves the best performance on 6 datasets compared to 13 state-of-the-art methods.

02

Effectively models intra- and inter-modality correlations with multi-grained fusion.

03

Utilizes multi-task learning to balance semantic and similarity constraints.

Abstract

Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on Deep Neural Network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: (1) In the first learning stage, they only model intra-modality correlation, but ignore inter-modality correlation with rich complementary context. (2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intra-modality and inter-modality correlation. (3) Only original instances are considered while the complementary fine-grained clues provided by their patches are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques