Cross-Modal Learning via Pairwise Constraints
Ran He, Man Zhang, Liang Wang, Ye Ji, Qiyue Yin

TL;DR
This paper introduces a unified framework for cross-modal learning using pairwise constraints, improving semantic alignment and accuracy in multimedia applications through novel algorithms for unsupervised and supervised tasks.
Contribution
It proposes a general compound regularization framework and specific algorithms for cross-modal clustering and matching, addressing semantic gap and outliers.
Findings
Enhanced clustering and retrieval accuracy with joint text-image modeling.
Effective reduction of semantic gap between modalities.
Demonstrated benefits of the proposed methods through extensive experiments.
Abstract
In multimedia applications, the text and image components in a web document form a pairwise constraint that potentially indicates the same semantic concept. This paper studies cross-modal learning via the pairwise constraint, and aims to find the common structure hidden in different modalities. We first propose a compound regularization framework to deal with the pairwise constraint, which can be used as a general platform for developing cross-modal algorithms. For unsupervised learning, we propose a cross-modal subspace clustering method to learn a common structure for different modalities. For supervised learning, to reduce the semantic gap and the outliers in pairwise constraints, we propose a cross-modal matching method based on compound ?21 regularization along with an iteratively reweighted algorithm to find the global optimum. Extensive experiments demonstrate the benefits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Analysis and Summarization
