COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval
Kush Juvekar, Anupam Purwar

TL;DR
This paper introduces COS-Mix, a hybrid retrieval method combining cosine similarity and cosine distance to enhance information retrieval accuracy, especially for sparse data, by capturing both similarity and dissimilarity in high-dimensional vector spaces.
Contribution
The paper presents a novel hybrid retrieval strategy that fuses cosine similarity and cosine distance measures, improving retrieval performance over traditional methods.
Findings
Enhanced retrieval accuracy demonstrated on proprietary datasets.
Hybrid approach captures both similarity and dissimilarity effectively.
Improved understanding of semantic relationships in information retrieval.
Abstract
This study proposes a novel hybrid retrieval strategy for Retrieval-Augmented Generation (RAG) that integrates cosine similarity and cosine distance measures to improve retrieval performance, particularly for sparse data. The traditional cosine similarity measure is widely used to capture the similarity between vectors in high-dimensional spaces. However, it has been shown that this measure can yield arbitrary results in certain scenarios. To address this limitation, we incorporate cosine distance measures to provide a complementary perspective by quantifying the dissimilarity between vectors. Our approach is experimented on proprietary data, unlike recent publications that have used open-source datasets. The proposed method demonstrates enhanced retrieval performance and provides a more comprehensive understanding of the semantic relationships between documents or items. This hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Text and Document Classification Technologies · Advanced Image and Video Retrieval Techniques
