An Experimental Comparison of Several Clustering and Initialization Methods
Marina Meila, David Heckerman

TL;DR
This paper compares several clustering and initialization methods for high-dimensional data, finding that the EM algorithm outperforms others and that different initializations lead to similar model quality.
Contribution
It provides an experimental comparison of clustering algorithms and initialization schemes, highlighting the effectiveness of EM and the robustness of different initializations.
Findings
EM significantly outperforms other clustering methods
Different initialization schemes produce similar quality models
Hierarchical clustering initialization can be effective for EM
Abstract
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Bayesian Modeling and Causal Inference
