Linear Algebraic Truncation Algorithm with A Posteriori Error Bounds for Computing Markov Chain Equilibrium Gradients
Saied Mahdian, Peter W. Glynn

TL;DR
This paper introduces a novel method for computing and bounding the error in equilibrium reward gradients of Markov chains with large or infinite state spaces, using regeneration and Lyapunov functions.
Contribution
It provides the first computable a posteriori error bounds for equilibrium reward gradients that incorporate truncation errors in Markov chain models.
Findings
Highly accurate bounds with moderate truncation sets
Extension of method to Markov jump processes
Effective use of regeneration and Lyapunov functions
Abstract
The numerical computation of equilibrium reward gradients for Markov chains appears in many applications for example within the policy improvement step arising in connection with average reward stochastic dynamic programming. When the state space is large or infinite, one will typically need to truncate the state space in order to arrive at a numerically tractable formulation. In this paper, we derive the first computable a posteriori error bounds for equilibrium reward gradients that account for the error induced by the truncation. Our approach uses regeneration to express equilibrium quantities in terms of the expectations of cumulative rewards over regenerative cycles. Lyapunov functions are then used to bound the contributions to these cumulative rewards and their gradients from path excursions that take the chain outside the truncation set. Our numerical results indicate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
