Hitting time for Markov decision process
Ruichao Jiang, Javad Tavakoli, and Yiqinag Zhao

TL;DR
This paper introduces a novel definition of hitting time for Markov decision processes (MDPs) by relating them to Markov processes with matching stationary distributions, addressing limitations of traditional approaches.
Contribution
The authors propose a new method to define MDP hitting time via an associated Markov process whose stationary distribution aligns with the MDP's occupancy measure.
Findings
Established a relationship between MDPs and PageRank.
Constructed a Markov process with matching stationary distribution.
Defined MDP hitting time based on the associated Markov process.
Abstract
We define the hitting time for a Markov decision process (MDP). We do not use the hitting time of the Markov process induced by the MDP because the induced chain may not have a stationary distribution. Even it has a stationary distribution, the stationary distribution may not coincide with the (normalized) occupancy measure of the MDP. We observe a relationship between the MDP and the PageRank. Using this observation, we construct an MP whose stationary distribution coincides with the normalized occupancy measure of the MDP and we define the hitting time of the MDP as the hitting time of the associated MP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Text Analysis Techniques · Bayesian Modeling and Causal Inference
