Markov Chain Order estimation with Conditional Mutual Information
Maria Papapetrou, Dimitris Kugiumtzis

TL;DR
This paper introduces a new method using Conditional Mutual Information (CMI) to estimate the order of Markov chains, providing a statistically rigorous test that outperforms some existing criteria, especially for larger orders.
Contribution
The paper develops a CMI-based significance testing approach for Markov chain order estimation, including analytic significance limits and a permutation-based test, improving accuracy over traditional methods.
Findings
CMI-testing outperforms AIC, BIC, and likelihood ratio tests for higher orders.
The method is validated through Monte Carlo simulations and applied to DNA sequences.
Effectiveness depends on data size, especially for large orders.
Abstract
We introduce the Conditional Mutual Information (CMI) for the estimation of the Markov chain order. For a Markov chain of symbols, we define CMI of order , , as the mutual information of two variables in the chain being time steps apart, conditioning on the intermediate variables of the chain. We find approximate analytic significance limits based on the estimation bias of CMI and develop a randomization significance test of , where the randomized symbol sequences are formed by random permutation of the components of the original symbol sequence. The significance test is applied for increasing and the Markov chain order is estimated by the last order for which the null hypothesis is rejected. We present the appropriateness of CMI-testing on Monte Carlo simulations and compare it to the Akaike and Bayesian information criteria, the maximal fluctuation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
