Simultaneous Clustering and Model Selection for Multinomial Distribution: A Comparative Study
Md. Abul Hasnat, Julien Velcin, St\'ephane Bonnevay, Julien Jacques

TL;DR
This paper compares various multinomial distribution-based clustering methods, introduces a novel hybrid approach, and evaluates their performance on synthetic and real datasets focusing on accuracy, stability, and efficiency.
Contribution
It provides a comprehensive comparison of existing methods and proposes a new hybrid clustering technique that improves stability and computation time.
Findings
Proposed method outperforms existing techniques in stability.
New approach achieves comparable accuracy with better efficiency.
Strategies identified for effective discrete data clustering.
Abstract
In this paper, we study different discrete data clustering methods, which use the Model-Based Clustering (MBC) framework with the Multinomial distribution. Our study comprises several relevant issues, such as initialization, model estimation and model selection. Additionally, we propose a novel MBC method by efficiently combining the partitional and hierarchical clustering techniques. We conduct experiments on both synthetic and real data and evaluate the methods using accuracy, stability and computation time. Our study identifies appropriate strategies to be used for discrete data analysis with the MBC methods. Moreover, our proposed method is very competitive w.r.t. clustering accuracy and better w.r.t. stability and computation time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
