Avoiding Overfitting in Variable-Order Markov Models: a Cross-Validation Approach
Valeria Secchini, Javier Garcia-Bernardo, Petr Jansk\'y

TL;DR
This paper introduces DIVOP, a cross-validation-based algorithm that effectively distinguishes meaningful higher-order dependencies in Markov models from noise, improving overfitting issues and enhancing predictive accuracy in complex systems.
Contribution
The paper presents DIVOP, a novel method using cross-validation to detect significant variable-order paths in Markov models, reducing overfitting and improving model interpretability.
Findings
DIVOP outperforms existing algorithms in synthetic and real datasets.
Application to corporate data reveals 82% of dependencies involve tax havens.
DIVOP enables more reliable multi-step predictions.
Abstract
Higherorder Markov chain models are widely used to represent agent transitions in dynamic systems, such as passengers in transport networks. They capture transitions in complex systems by considering not only the current state but also the path of previously visited states. For example, the likelihood of train passengers traveling from Paris (current state) to Rome could increase significantly if their journey originated in Italy (prior state). Although this approach provides a more faithful representation of the system than firstorder models, we find that commonly used methodsrelying on KullbackLeibler divergencefrequently overfit the data, mistaking fluctuations for higherorder dependencies and undermining forecasts and resource allocation. Here, we introduce DIVOP (Detection of Informative VariableOrder Paths), an algorithm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
