Clustering risk in Non-parametric Hidden Markov and I.I.D. Models
Elisabeth Gassiat, Ibrahim Kaddouri, Zacharie Naulet

TL;DR
This paper analyzes the theoretical limits of clustering accuracy in Hidden Markov and i.i.d. models, showing that the Bayes classifier is nearly optimal for clustering despite some differences.
Contribution
It provides a theoretical framework for understanding clustering risk in non-parametric HMMs and i.i.d. models, including bounds and practical justifications.
Findings
Bayes classifier is nearly optimal for clustering
Bounds on clustering excess risk are established
Simulations confirm theoretical results
Abstract
We conduct an in-depth analysis of the Bayes risk of clustering in the context of Hidden Markov and i.i.d. models. In both settings, we identify the situations where this risk is comparable to the Bayes risk of classification and those where its minimizer, the Bayes clusterer, can be derived from the Bayes classifier. While we demonstrate that clustering based on the Bayes classifier does not always match the optimal Bayes clusterer, we show that this difference is primarily theoretical and that the Bayes classifier remains nearly optimal for clustering. A key quantity emerges, capturing the fundamental difficulty of both classification and clustering tasks. Furthermore, by leveraging the identifiability of HMMs, we establish bounds on the clustering excess risk of a plug-in Bayes classifier in the general nonparametric setting, offering theoretical justification for its widespread use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Management and Algorithms
