Bayesian Cluster Enumeration Criterion for Unsupervised Learning

Freweyni K. Teklehaymanot; Michael Muma; and Abdelhak M. Zoubir

arXiv:1710.07954·math.ST·August 28, 2018

Bayesian Cluster Enumeration Criterion for Unsupervised Learning

Freweyni K. Teklehaymanot, Michael Muma, and Abdelhak M. Zoubir

PDF

1 Repo

TL;DR

This paper introduces a Bayesian Information Criterion tailored for determining the number of clusters in data, with a new derivation that accounts for data structure, and proposes a two-step algorithm for cluster enumeration tested on synthetic and real datasets.

Contribution

It derives a new BIC for clustering that incorporates data structure and provides a practical two-step algorithm for cluster enumeration.

Findings

01

The new BIC improves cluster number estimation accuracy.

02

The proposed algorithm performs well on synthetic and real data.

03

Incorporating data structure into BIC alters the penalty term.

Abstract

We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FreTekle/Bayesian-Cluster-Enumeration
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.