An Analytical Approach to the Jaccard Similarity Index
Gonzalo Travieso, Alexandre Benatti, Luciano da F. Costa

TL;DR
This paper develops an analytical method to estimate the probability density of the Jaccard similarity index for real-valued data, enhancing understanding of data relationships in various scientific and analytical contexts.
Contribution
It introduces a novel analytical approach for estimating the probability density of the Jaccard similarity index for data with specific statistical densities, including uniform and normal distributions.
Findings
Analytical expressions for the probability density of the Jaccard index are derived.
The approach is applicable to data visualization, pattern recognition, and scientific modeling.
Extensions for regularization and control of similarity sharpness are proposed.
Abstract
The Jaccard similarity index has often been employed in science and technology as a means to quantify the similarity between two sets. When modified to operate on real-valued values, the Jaccard similarity index can be applied to compare vectors, an operation which plays a central role in visualization, classification, and modeling. The present work aims at developing an analytical approach for estimating the probability density of the Jaccard similarity values as implied by set of data elements characterized by specific statistical densities, with emphasis on the uniform and normal cases. Several theoretical and practical situations can benefit directly from such an approach, as it allows several of the properties of the similarity comparisons among a given dataset to be better understood and anticipated. Situations in which the described approach can be applied include the estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Computing and Networks
