TL;DR
reval is a Python package that uses stability-based relative validation to identify the most generalizable clustering solutions, addressing the lack of open-source tools for this approach and improving clustering assessment methods.
Contribution
It introduces a Python package implementing stability-based relative validation, enabling better selection of clustering solutions that generalize to unseen data.
Findings
Supports multiple clustering algorithms
Automates labeling and stability assessment
Enhances clustering validation with open-source tool
Abstract
Determining the best partition for a dataset can be a challenging task because of 1) the lack of a priori information within an unsupervised learning framework; and 2) the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to determine best clustering solutions as the ones that best generalize to unseen data. Statistical software, both in R and Python, usually rely on internal validation metrics, such as silhouette, to select the number of clusters that best fits the data. Meanwhile, open-source software solutions that easily implement relative clustering techniques are lacking. Internal validation methods exploit characteristics of the data itself to produce a result, whereas relative approaches attempt to leverage the unknown underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
