Linking PageRank, Time Reversal, and Policy Evaluation
Konstantin Avrachenkov, Lorenzo Gregoris, Nelly Litvak

TL;DR
This paper links policy evaluation in Markov decision processes to PageRank, showing how value functions can be derived from PageRank vectors of time-reversed chains, enabling efficient analysis.
Contribution
It establishes a novel connection between MDP policy evaluation and PageRank, extending the framework to various types of MDPs and providing a decomposition theorem.
Findings
Policy evaluation reduces to PageRank problems on chain components.
The approach extends to undiscounted MDPs with terminal states.
Numerical examples demonstrate efficiency on large graphs.
Abstract
We establish a connection between policy evaluation in Markov decision processes and PageRank in network analysis. For a fixed policy, we show that the value function of a discounted Markov decision process can be obtained, up to an explicit rescaling, from the PageRank vector of a suitably defined time-reversed Markov chain. In this correspondence, the discount factor plays the role of the teleportation parameter, while rewards induce the restart distribution. Beyond the irreducible case, invoking quasi-stationary distributions and Doob -transforms, we prove a general decomposition theorem showing that policy evaluation for arbitrary finite MDPs reduces to a collection of PageRank problems on the recurrent and transient components of the policy-induced Markov chain. This framework naturally extends to undiscounted MDPs with terminal states and to transition-dependent rewards. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
