A Universal Model for Cross Modality Mapping by Relational Reasoning
Zun Li, Congyan Lang, Liqian Liang, Tao Wang, Songhe Feng, Jun Wu, and, Yidong Li

TL;DR
This paper introduces a universal graph-based relational reasoning network that models intra- and inter-instance relations to improve cross modality mapping across diverse tasks like image classification, social recommendation, and sound recognition.
Contribution
It proposes a GCN-based RR-Net that explicitly models intra- and inter-relations for cross modality mapping, addressing limitations of previous similarity-based methods.
Findings
Outperforms existing methods on multiple tasks
Demonstrates universality across different modalities
Effectively models complex relational structures
Abstract
With the aim of matching a pair of instances from two different modalities, cross modality mapping has attracted growing attention in the computer vision community. Existing methods usually formulate the mapping function as the similarity measure between the pair of instance features, which are embedded to a common space. However, we observe that the relationships among the instances within a single modality (intra relations) and those between the pair of heterogeneous instances (inter relations) are insufficiently explored in previous approaches. Motivated by this, we redefine the mapping function with relational reasoning via graph modeling, and further propose a GCN-based Relational Reasoning Network (RR-Net) in which inter and intra relations are efficiently computed to universally resolve the cross modality mapping problem. Concretely, we first construct two kinds of graph, i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Advanced Image and Video Retrieval Techniques
