TL;DR
scikit-hubness is a Python package that addresses hubness in high-dimensional data, improving neighbor search, classification, and visualization tasks through analysis, reduction, and approximate search algorithms, integrated into scikit-learn.
Contribution
It introduces a comprehensive Python toolkit for hubness analysis, reduction, and approximate neighbor search, integrated with scikit-learn for easier adoption in machine learning workflows.
Findings
Provides algorithms for hubness analysis and reduction
Enables efficient approximate neighbor search in high dimensions
Integrated into scikit-learn for seamless use
Abstract
This paper introduces scikit-hubness, a Python package for efficient nearest neighbor search in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality, and is known to impair various learning tasks, including classification, clustering, and visualization. scikit-hubness provides algorithms for hubness analysis ("Is my data affected by hubness?"), hubness reduction ("How can we improve neighbor retrieval in high dimensions?"), and approximate neighbor search ("Does it work for large data sets?"). It is integrated into the scikit-learn environment, enabling rapid adoption by Python-based machine learning researchers and practitioners. Users will find all functionality of the scikit-learn neighbors package, plus additional support for transparent hubness reduction and approximate nearest neighbor search. scikit-hubness is developed using several quality assessment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
