
TL;DR
Cluster Forests is a novel clustering ensemble method inspired by Random Forests, which uses geometric probing and spectral clustering to improve clustering accuracy and robustness, supported by empirical and theoretical analysis.
Contribution
Introduces Cluster Forests, a new clustering ensemble approach that combines local clustering with spectral aggregation guided by a quality measure, enhancing robustness and performance.
Findings
CF outperforms competitors on real datasets
The kappa measure improves local clustering quality
Theoretical analysis provides bounds on mis-clustering rate
Abstract
With inspiration from Random Forests (RF) in the context of classification, a new clustering ensemble method---Cluster Forests (CF) is proposed. Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure kappa. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis reveals that the kappa measure makes it possible to grow the local clustering in a desirable way---it is "noise-resistant". A closed-form expression is obtained for the mis-clustering rate of spectral clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSpectral Clustering
