Simple to Complex Cross-modal Learning to Rank
Minnan Luo, Xiaojun Chang, Zhihui Li, Liqiang Nie and, Alexander G. Hauptmann, Qinghua Zheng

TL;DR
This paper introduces a self-paced, diversity-aware, non-linear cross-modal learning to rank method that improves multimedia retrieval by gradually learning from easy to complex rankings, enhancing robustness and generalization.
Contribution
It proposes a novel self-paced learning approach with diversity for cross-modal ranking using non-linear embeddings, surpassing linear models and traditional training strategies.
Findings
Significant performance improvements over state-of-the-art methods.
Enhanced robustness to outliers and better generalization.
Effective convergence with an efficient optimization algorithm.
Abstract
The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval. Some studies formalize the cross-modal retrieval tasks as a ranking problem and learn a shared multi-modal embedding space to measure the cross-modality similarity. However, previous methods often establish the shared embedding space based on linear mapping functions which might not be sophisticated enough to reveal more complicated inter-modal correspondences. Additionally, current studies assume that the rankings are of equal importance, and thus all rankings are used simultaneously, or a small number of rankings are selected randomly to train the embedding space at each iteration. Such strategies, however, always suffer from outliers as well as reduced generalization capability due to their lack of insightful understanding of procedure of human cognition. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
