A Bayesian non-parametric method for clustering high-dimensional binary data
Tapesh Santra

TL;DR
This paper introduces a Bayesian non-parametric clustering algorithm for high-dimensional binary data that automatically determines the number of clusters and outperforms existing methods in simulations and real-world applications.
Contribution
The paper presents a novel Bayesian non-parametric clustering method using a Dirichlet Process mixture model combined with simulated annealing for high-dimensional binary data.
Findings
Outperforms other clustering algorithms in simulation studies.
Successfully clusters documents, handwritten images, and cancer data.
Automatically determines the optimal number of clusters.
Abstract
In many real life problems, objects are described by large number of binary features. For instance, documents are characterized by presence or absence of certain keywords; cancer patients are characterized by presence or absence of certain mutations etc. In such cases, grouping together similar objects/profiles based on such high dimensional binary features is desirable, but challenging. Here, I present a Bayesian non parametric algorithm for clustering high dimensional binary data. It uses a Dirichlet Process (DP) mixture model and simulated annealing to not only cluster binary data, but also find optimal number of clusters in the data. The performance of the algorithm was evaluated and compared with other algorithms using simulated datasets. It outperformed all other clustering methods that were tested in the simulation studies. It was also used to cluster real datasets arising from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Algorithms and Data Compression · Advanced Clustering Algorithms Research
