Reversible Markov decision processes and the Gaussian free field
Venkat Anantharam

TL;DR
This paper characterizes reversible Markov decision processes with finite states and actions, simplifies policy iteration for such problems, and explores their connection to the Gaussian free field.
Contribution
It provides a complete characterization of reversible MDPs and links their reward dynamics to the Gaussian free field, enabling simplified algorithms.
Findings
Reversible MDPs are fully characterized.
Policy iteration can be significantly simplified for these MDPs.
Finite-time reward evolution relates to the Gaussian free field.
Abstract
A Markov decision problem is called reversible if the stationary controlled Markov chain is reversible under every stationary Markovian strategy. A natural application in which such problems arise is in the control of Metropolis-Hastings type dynamics. We characterize all discrete time reversible Markov decision processes with finite state and actions spaces. We show that policy iteration algorithm for finding an optimal policy can be significantly simplified Markov decision problems of this type. We also highlight the relation between the finite time evolution of the accrual of reward and the Gaussian free field associated to the controlled Markov chain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Reinforcement Learning in Robotics
