GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
Haiwen Diao, Ying Zhang, Shang Gao, Jiawen Zhu, Long Chen, Huchuan Lu

TL;DR
This paper introduces GSSF, a novel similarity function for cross-modal metric learning that adaptively captures complex relationships between modalities, improving retrieval performance across various tasks.
Contribution
It proposes a Generalized Structural Sparse Function that dynamically models cross-channel relevancy with a balanced complexity, enhancing cross-modal and uni-modal retrieval tasks.
Findings
Outperforms existing methods on image-text retrieval and re-identification tasks.
Demonstrates flexibility and effectiveness in multiple application scenarios.
Can be integrated into attention mechanisms and knowledge distillation frameworks.
Abstract
Cross-modal metric learning is a prominent research topic that bridges the semantic heterogeneity between vision and language. Existing methods frequently utilize simple cosine or complex distance metrics to transform the pairwise features into a similarity score, which suffers from an inadequate or inefficient capability for distance measurements. Consequently, we propose a Generalized Structural Sparse Function to dynamically capture thorough and powerful relationships across modalities for pair-wise similarity learning while remaining concise but efficient. Specifically, the distance metric delicately encapsulates two formats of diagonal and block-diagonal terms, automatically distinguishing and highlighting the cross-channel relevancy and dependency inside a structured and organized topology. Hence, it thereby empowers itself to adapt to the optimal matching patterns between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · Knowledge Distillation
