Reliable Clustering of Bernoulli Mixture Models

Amir Najafi; Abolfazl Motahari; Hamid R. Rabiee

arXiv:1710.02101·cs.LG·June 18, 2019

Reliable Clustering of Bernoulli Mixture Models

Amir Najafi, Abolfazl Motahari, Hamid R. Rabiee

PDF

TL;DR

This paper provides the first non-asymptotic theoretical bounds on the sample complexity needed to reliably cluster Bernoulli Mixture Models, which are important in various real-world applications involving binary data.

Contribution

It introduces novel non-asymptotic bounds on sample complexity for PAC-clusterability of BMMs when the number of clusters is unknown.

Findings

01

Derived conditions on sample size and dimension for PAC-clusterability.

02

Established the first non-asymptotic bounds on learning BMMs.

03

Provided theoretical guarantees for clustering binary data models.

Abstract

A Bernoulli Mixture Model (BMM) is a finite mixture of random binary vectors with independent dimensions. The problem of clustering BMM data arises in a variety of real-world applications, ranging from population genetics to activity analysis in social networks. In this paper, we analyze the clusterability of BMMs from a theoretical perspective, when the number of clusters is unknown. In particular, we stipulate a set of conditions on the sample complexity and dimension of the model in order to guarantee the Probably Approximately Correct (PAC)-clusterability of a dataset. To the best of our knowledge, these findings are the first non-asymptotic bounds on the sample complexity of learning or clustering BMMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.