Variance-Adjusted Cosine Distance as Similarity Metric
Satyajeet Sahoo, Jhareswar Maiti

TL;DR
This paper identifies limitations of cosine similarity in non-Euclidean spaces with correlated data and proposes a variance-adjusted cosine distance that improves similarity measurement, achieving perfect accuracy in a breast cancer classification task.
Contribution
The paper introduces a novel variance-adjusted cosine similarity metric that overcomes the limitations of traditional cosine similarity in correlated data spaces.
Findings
Variance-adjusted cosine distance outperforms traditional cosine similarity.
KNN with the new metric achieves 100% accuracy on the Wisconsin Breast Cancer Dataset.
Traditional cosine similarity is limited to Euclidean spaces with uncorrelated data.
Abstract
Cosine similarity is a popular distance measure that measures the similarity between two vectors in the inner product space. It is widely used in many data classification algorithms like K-Nearest Neighbors, Clustering etc. This study demonstrates limitations of application of cosine similarity. Particularly, this study demonstrates that traditional cosine similarity metric is valid only in the Euclidean space, whereas the original data resides in a random variable space. When there is variance and correlation in the data, then cosine distance is not a completely accurate measure of similarity. While new similarity and distance metrics have been developed to make up for the limitations of cosine similarity, these metrics are used as substitutes to cosine distance, and do not make modifications to cosine distance to overcome its limitations. Subsequently, we propose a modified cosine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
