Clustering risk in Non-parametric Hidden Markov and I.I.D. Models

Elisabeth Gassiat; Ibrahim Kaddouri; Zacharie Naulet

arXiv:2309.12238·math.ST·May 28, 2025·1 cites

Clustering risk in Non-parametric Hidden Markov and I.I.D. Models

Elisabeth Gassiat, Ibrahim Kaddouri, Zacharie Naulet

PDF

Open Access

TL;DR

This paper analyzes the theoretical limits of clustering accuracy in Hidden Markov and i.i.d. models, showing that the Bayes classifier is nearly optimal for clustering despite some differences.

Contribution

It provides a theoretical framework for understanding clustering risk in non-parametric HMMs and i.i.d. models, including bounds and practical justifications.

Findings

01

Bayes classifier is nearly optimal for clustering

02

Bounds on clustering excess risk are established

03

Simulations confirm theoretical results

Abstract

We conduct an in-depth analysis of the Bayes risk of clustering in the context of Hidden Markov and i.i.d. models. In both settings, we identify the situations where this risk is comparable to the Bayes risk of classification and those where its minimizer, the Bayes clusterer, can be derived from the Bayes classifier. While we demonstrate that clustering based on the Bayes classifier does not always match the optimal Bayes clusterer, we show that this difference is primarily theoretical and that the Bayes classifier remains nearly optimal for clustering. A key quantity emerges, capturing the fundamental difficulty of both classification and clustering tasks. Furthermore, by leveraging the identifiability of HMMs, we establish bounds on the clustering excess risk of a plug-in Bayes classifier in the general nonparametric setting, offering theoretical justification for its widespread use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Management and Algorithms