Infinite Horizon Average Cost Dynamic Programming Subject to Total Variation Distance Ambiguity
Ioannis Tzortzis, Charalambos D. Charalambous, Themistoklis, Charalambous

TL;DR
This paper develops new dynamic programming equations and policy iteration algorithms for infinite horizon average cost Markov control models under total variation distance ambiguity, with applications demonstrated through examples.
Contribution
It introduces generalized dynamic programming equations and policy iteration algorithms accounting for total variation ambiguity in controlled Markov processes.
Findings
New policy iteration algorithms using water filling solutions.
Conditions for irreducibility of maximizing distributions.
Application to finite and Borel space models.
Abstract
We analyze the infinite horizon minimax average cost Markov Control Model (MCM), for a class of controlled process conditional distributions, which belong to a ball, with respect to total variation distance metric, centered at a known nominal controlled conditional distribution with radius , in which the minimization is over the control strategies and the maximization is over conditional distributions. Upon performing the maximization, a dynamic programming equation is obtained which includes, in addition to the standard terms, the oscillator semi-norm of the cost-to-go. First, the dynamic programming equation is analyzed for finite state and control spaces. We show that if the nominal controlled process distribution is irreducible, then for every stationary Markov control policy the maximizing conditional distribution of the controlled process is also irreducible for $R \in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
