Estimating the number of clusters of a Block Markov Chain
Thomas van Vuren, Thomas Cronk, Jaron Sanders

TL;DR
This paper introduces a spectral clustering method to estimate the number of clusters in data generated by Block Markov Chains, addressing the challenge of cluster number selection in sequential data analysis.
Contribution
It proposes a novel spectral embedding approach with singular value thresholding for estimating cluster count in Block Markov Chain trajectories, backed by theoretical consistency results.
Findings
Method is asymptotically consistent despite dependencies.
Performs well even with sparse count matrices.
Outperforms alternative methods in numerical evaluations.
Abstract
Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
MethodsSparse Evolutionary Training
