Joint Generalized Cosine Similarity: A Novel Method for N-Modal Semantic Alignment Based on Contrastive Learning
Yiqiao Chen, Zijian Huang

TL;DR
This paper introduces the Joint Generalized Cosine Similarity (JGCS), a novel similarity measure for multi-modal alignment that handles any number of vectors, improving contrastive learning with better robustness, efficiency, and scalability.
Contribution
The paper proposes JGCS, the first similarity measure capable of handling multiple vectors simultaneously, and develops a new contrastive learning paradigm with superior experimental performance.
Findings
JGCS outperforms traditional methods in experiments.
The method shows noise robustness and computational efficiency.
It is scalable to multiple modalities and extendable to other domains.
Abstract
Alignment remains a crucial task in multi-modal deep learning, and contrastive learning has been widely applied in this field. However, when there are more than two modalities, existing methods typically calculate pairwise loss function and aggregate them into a composite loss function for the optimization of model parameters. This limitation mainly stems from the drawbacks of traditional similarity measurement method (i.e. they can only calculate the similarity between two vectors). To address this issue, we propose a novel similarity measurement method: the Joint Generalized Cosine Similarity (JGCS). Unlike traditional pairwise methods (e.g., dot product or cosine similarity), JGCS centers around the angle derived from the Gram determinant. To the best of our knowledge, this is the first similarity measurement method capable of handling tasks involving an arbitrary number of vectors.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques
