Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients
Yang Xu, Vaneet Aggarwal

TL;DR
This paper introduces a novel fixed-policy evaluation method for finite Markov chains that isolates persistent behaviors from transient effects using minimal peripheral quotients, leading to more accurate and stable estimations.
Contribution
It identifies the peripheral invariant subspace as the source of ambiguity and develops a quotient-based decomposition that separates persistent and transient dynamics in Markov chains.
Findings
Reconstructs finite-horizon returns accurately.
Recovers statewise average reward.
Provides a stable estimator under a generative model.
Abstract
We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace of the transition matrix as the source of this ambiguity. Quotienting by is the minimal exact quotient that removes all non-decaying modes and makes the remaining dynamics strictly stable. After choosing a gauge projection with kernel , the reward admits a unique decomposition , where is a persistent regime profile and is a gauge-fixed transient component. An exact comparison…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
