Hitting time for Markov decision process

Ruichao Jiang; Javad Tavakoli; and Yiqinag Zhao

arXiv:2205.03476·cs.LG·May 12, 2022

Hitting time for Markov decision process

Ruichao Jiang, Javad Tavakoli, and Yiqinag Zhao

PDF

Open Access

TL;DR

This paper introduces a novel definition of hitting time for Markov decision processes (MDPs) by relating them to Markov processes with matching stationary distributions, addressing limitations of traditional approaches.

Contribution

The authors propose a new method to define MDP hitting time via an associated Markov process whose stationary distribution aligns with the MDP's occupancy measure.

Findings

01

Established a relationship between MDPs and PageRank.

02

Constructed a Markov process with matching stationary distribution.

03

Defined MDP hitting time based on the associated Markov process.

Abstract

We define the hitting time for a Markov decision process (MDP). We do not use the hitting time of the Markov process induced by the MDP because the induced chain may not have a stationary distribution. Even it has a stationary distribution, the stationary distribution may not coincide with the (normalized) occupancy measure of the MDP. We observe a relationship between the MDP and the PageRank. Using this observation, we construct an MP whose stationary distribution coincides with the normalized occupancy measure of the MDP and we define the hitting time of the MDP as the hitting time of the associated MP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Text Analysis Techniques · Bayesian Modeling and Causal Inference