An Experimental Comparison of Several Clustering and Initialization   Methods

Marina Meila; David Heckerman

arXiv:1301.7401·cs.LG·May 19, 2015·103 cites

An Experimental Comparison of Several Clustering and Initialization Methods

Marina Meila, David Heckerman

PDF

Open Access

TL;DR

This paper compares several clustering and initialization methods for high-dimensional data, finding that the EM algorithm outperforms others and that different initializations lead to similar model quality.

Contribution

It provides an experimental comparison of clustering algorithms and initialization schemes, highlighting the effectiveness of EM and the robustness of different initializations.

Findings

01

EM significantly outperforms other clustering methods

02

Different initialization schemes produce similar quality models

03

Hierarchical clustering initialization can be effective for EM

Abstract

We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Bayesian Modeling and Causal Inference