Optimal prediction of Markov chains with and without spectral gap
Yanjun Han, Soham Jana, Yihong Wu

TL;DR
This paper investigates the optimal prediction risk for Markov chains with dependent data, revealing how spectral gap influences the rate and providing bounds that connect Markov and iid models.
Contribution
It characterizes the optimal prediction risk for Markov chains with spectral gap considerations and extends results to higher-order chains.
Findings
Optimal prediction risk for k-state Markov chains is (rac{k^2}{n}\u2206 rac{n}{k^2}) for certain k.
Prediction risk matches iid models when spectral gap is not too small.
Extensions to higher-order Markov chains are provided.
Abstract
We study the following learning problem with dependent data: Observing a trajectory of length from a stationary Markov chain with states, the goal is to predict the next state. For , using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be , in contrast to the optimal rate of for previously shown in Falahatgar et al. (2016). These rates, slower than the parametric rate of , can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Machine Learning and Algorithms · Algorithms and Data Compression
