The relation between Pearson's correlation coefficient r and Salton's cosine measure
Leo Egghe, Loet Leydesdorff

TL;DR
This paper explores the mathematical relationship between Pearson's r and Salton's cosine measure, providing theoretical insights and empirical validation using co-citation data to improve vector space visualization.
Contribution
It reveals the relation between Pearson's correlation and cosine measure based on vector norms, and proposes an algorithm to optimize visualization thresholds.
Findings
Theoretical relation between Pearson's r and cosine measure confirmed by co-citation data.
A threshold for cosine to ensure non-negative Pearson correlations is proposed.
Empirical validation with co-citation matrices supports the theoretical results.
Abstract
The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These different values yield a sheaf of increasingly straight lines which form together a cloud of points, being the investigated relation. The theoretical results are tested against the author co-citation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric co-citation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm which provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
