Cross-Modal Similarity Learning : A Low Rank Bilinear Formulation
Cuicui Kang, Shengcai Liao, Yonghao He, Jian Wang, Wenjia Niu, Shiming, Xiang, Chunhong Pan

TL;DR
This paper introduces a novel low-rank bilinear similarity learning method for cross-modal retrieval, effectively addressing heterogeneity and dimensionality issues between different media modalities, and demonstrating superior performance on benchmark datasets.
Contribution
It proposes a new low-rank bilinear formulation with nuclear-norm penalization for cross-modal similarity learning, improving over existing metric learning approaches.
Findings
Achieves state-of-the-art results on image-text retrieval datasets.
Uses accelerated proximal gradient for fast convergence.
Effectively handles heterogeneity and dimensionality in cross-modal features.
Abstract
The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on the Internet. A new approach to the problem has been raised which intends to match features of different modalities directly. In this research, there are two critical issues: how to get rid of the heterogeneity between different modalities and how to match the cross-modal features of different dimensions. Recently metric learning methods show a good capability in learning a distance metric to explore the relationship between data points. However, the traditional metric learning algorithms only focus on single-modal features, which suffer difficulties in addressing the cross-modal features of different dimensions. In this paper, we propose a cross-modal similarity learning algorithm for the cross-modal feature matching. The proposed method takes a bilinear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
