Determining the Number of Clusters via Iterative Consensus Clustering

Shaina Race; Carl Meyer; Kevin Valakuzhy

arXiv:1408.0967·stat.ML·August 6, 2014

Determining the Number of Clusters via Iterative Consensus Clustering

Shaina Race, Carl Meyer, Kevin Valakuzhy

PDF

TL;DR

This paper introduces an iterative consensus clustering method that uses a random walk and eigenvalue analysis on a consensus matrix to accurately determine the number of clusters, especially in noisy or high-dimensional data.

Contribution

The paper proposes a novel iterative approach to refine consensus matrices for spectral clustering, improving cluster number estimation in challenging data scenarios.

Findings

01

Consensus matrix outperforms existing similarity matrices.

02

Eigenvalue analysis effectively determines the number of clusters.

03

Iterative refinement enhances clustering accuracy in noisy data.

Abstract

We use a cluster ensemble to determine the number of clusters, k, in a group of data. A consensus similarity matrix is formed from the ensemble using multiple algorithms and several values for k. A random walk is induced on the graph defined by the consensus matrix and the eigenvalues of the associated transition probability matrix are used to determine the number of clusters. For noisy or high-dimensional data, an iterative technique is presented to refine this consensus matrix in way that encourages a block-diagonal form. It is shown that the resulting consensus matrix is generally superior to existing similarity matrices for this type of spectral analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.