Fitting Sparse Markov Models to Categorical Time Series Using Convex Clustering
Tuhin Majumder, Soumendra Lahiri, Donald Martin

TL;DR
This paper introduces a convex clustering approach for fitting Sparse Markov Models to categorical time series, effectively managing model complexity and demonstrating strong theoretical and empirical performance.
Contribution
It proposes a novel convex clustering method for fitting SMMs, with theoretical guarantees and practical validation on real and simulated data.
Findings
Method achieves model selection consistency as sample size grows.
Extensive simulations show good finite-sample performance.
Real data application successfully models and classifies disease sub-types.
Abstract
Higher-order Markov chains are frequently used to model categorical time series. However, a major problem with fitting such models is the exponentially growing number of parameters in the model order. A popular approach to parsimonious modeling is to use a Variable Length Markov Chain (VLMC), which determines relevant contexts (recent pasts) of variable orders and forms a context tree. A more general parsimonious modeling approach is given by Sparse Markov Models (SMMs), where all possible histories of order are partitioned such that the transition probability vectors are identical for the histories belonging to any particular group. In this paper, we develop an elegant method of fitting SMMs based on convex clustering and regularization. The regularization parameter is selected using the BIC criterion. Theoretical results establish model selection consistency of our method for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bayesian Methods and Mixture Models · Genomics and Phylogenetic Studies
