Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

Yang Xu; Vaneet Aggarwal

arXiv:2602.00474·stat.ML·May 11, 2026

Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

Yang Xu, Vaneet Aggarwal

PDF

TL;DR

This paper introduces a novel fixed-policy evaluation method for finite Markov chains that isolates persistent behaviors from transient effects using minimal peripheral quotients, leading to more accurate and stable estimations.

Contribution

It identifies the peripheral invariant subspace as the source of ambiguity and develops a quotient-based decomposition that separates persistent and transient dynamics in Markov chains.

Findings

01

Reconstructs finite-horizon returns accurately.

02

Recovers statewise average reward.

03

Provides a stable estimator under a generative model.

Abstract

We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $K (P)$ of the transition matrix $P$ as the source of this ambiguity. Quotienting by $K (P)$ is the minimal exact quotient that removes all non-decaying modes and makes the remaining dynamics strictly stable. After choosing a gauge projection $Π$ with kernel $K (P)$ , the reward admits a unique decomposition $r = g_{Π}^{⋆} + (I - P) v_{Π}^{⋆}$ , where $g_{Π}^{⋆}$ is a persistent regime profile and $v_{Π}^{⋆}$ is a gauge-fixed transient component. An exact comparison…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.