GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric   Learning

Haiwen Diao; Ying Zhang; Shang Gao; Jiawen Zhu; Long Chen; Huchuan Lu

arXiv:2410.15266·cs.CV·October 22, 2024

GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning

Haiwen Diao, Ying Zhang, Shang Gao, Jiawen Zhu, Long Chen, Huchuan Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces GSSF, a novel similarity function for cross-modal metric learning that adaptively captures complex relationships between modalities, improving retrieval performance across various tasks.

Contribution

It proposes a Generalized Structural Sparse Function that dynamically models cross-channel relevancy with a balanced complexity, enhancing cross-modal and uni-modal retrieval tasks.

Findings

01

Outperforms existing methods on image-text retrieval and re-identification tasks.

02

Demonstrates flexibility and effectiveness in multiple application scenarios.

03

Can be integrated into attention mechanisms and knowledge distillation frameworks.

Abstract

Cross-modal metric learning is a prominent research topic that bridges the semantic heterogeneity between vision and language. Existing methods frequently utilize simple cosine or complex distance metrics to transform the pairwise features into a similarity score, which suffers from an inadequate or inefficient capability for distance measurements. Consequently, we propose a Generalized Structural Sparse Function to dynamically capture thorough and powerful relationships across modalities for pair-wise similarity learning while remaining concise but efficient. Specifically, the distance metric delicately encapsulates two formats of diagonal and block-diagonal terms, automatically distinguishing and highlighting the cross-channel relevancy and dependency inside a structured and organized topology. Hence, it thereby empowers itself to adapt to the optimal matching patterns between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

paranioar/gssf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need · Knowledge Distillation