Semantic Modeling of Textual Relationships in Cross-Modal Retrieval
Jing Yu, Chenghao Yang, Zengchang Qin, Zhuoqian Yang, Yue Hu and, Weifeng Zhang

TL;DR
This paper introduces a novel cross-modal retrieval model that leverages a featured graph for textual relationships and dual-path neural networks to improve semantic similarity measurement between texts and images.
Contribution
It proposes a relation-aware text representation using GCNs and a joint learning framework for multi-modal features, outperforming existing models.
Findings
Achieved 3.4% and 6.3% higher accuracy on benchmark datasets.
Effectively models semantic, co-occurrence, and prior relations in text.
Outperforms state-of-the-art models in cross-modal retrieval.
Abstract
Feature modeling of different modalities is a basic problem in current research of cross-modal information retrieval. Existing models typically project texts and images into one embedding space, in which semantically similar information will have a shorter distance. Semantic modeling of textural relationships is notoriously difficult. In this paper, we propose an approach to model texts using a featured graph by integrating multi-view textual relationships including semantic relations, statistical co-occurrence, and prior relations in the knowledge base. A dual-path neural network is adopted to learn multi-modal representations of information and cross-modal similarity measure jointly. We use a Graph Convolutional Network (GCN) for generating relation-aware text representations, and use a Convolutional Neural Network (CNN) with non-linearities for image representations. The cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
