Fuzzy Jaccard Index: A robust comparison of ordered lists

Matej Petkovi\'c; Bla\v{z} \v{S}krlj; Dragi Kocev; Nikola Simidjievski

arXiv:2008.02216·cs.LG·October 6, 2021

Fuzzy Jaccard Index: A robust comparison of ordered lists

Matej Petkovi\'c, Bla\v{z} \v{S}krlj, Dragi Kocev, Nikola Simidjievski

PDF

2 Repos

TL;DR

The paper introduces FUJI, a new scale-invariant similarity measure for ranked lists that improves stability and accuracy over traditional methods, with theoretical analysis, efficient computation, and practical machine learning applications.

Contribution

It presents FUJI, a novel fuzzy Jaccard-based score for comparing ordered lists, with theoretical properties, an efficient algorithm, and demonstrated advantages in high-dimensional feature ranking.

Findings

01

FUJI outperforms benchmark similarity scores in robustness and efficiency.

02

Empirical tests show FUJI's effectiveness in synthetic scenarios.

03

Application to feature ranking improves interpretability and predictive performance.

Abstract

We propose Fuzzy Jaccard Index (FUJI) -- a scale-invariant score for assessment of the similarity between two ranked/ordered lists. FUJI improves upon the Jaccard index by incorporating a membership function which takes into account the particular ranks, thus producing both more stable and more accurate similarity estimates. We provide theoretical insights into the properties of the FUJI score as well as propose an efficient algorithm for computing it. We also present empirical evidence of its performance on different synthetic scenarios. Finally, we demonstrate its utility in a typical machine learning setting -- comparing feature ranking lists relevant to a given machine learning task. In real-life, and in particular high-dimensional domains, where only a small percentage of the whole feature space might be relevant, a robust and confident feature ranking leads to interpretable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.