Epsilon-Optimal Policies for Average-Cost Separable MDPs with Perturbations
Dhairya Kantawala

TL;DR
This paper develops epsilon-optimal policies for average-cost Markov Decision Processes with nearly separable reward and transition structures, demonstrating robustness of the optimal policy under small perturbations.
Contribution
It provides explicit stationary policies for separable MDPs and proves their epsilon-optimality persists under epsilon-perturbations.
Findings
Explicit stationary decision rule for separable MDPs
Epsilon-optimality of the policy under perturbations
Bound on average reward loss proportional to epsilon
Abstract
We study a class of infinite-horizon average-cost Markov Decision Processes (MDPs) whose reward and transition structures are nearly separable. For the totally separable baseline (that is, with no perturbation), we derive an explicit stationary decision rule that is exactly average-optimal. We then show that under an epsilon-perturbation of the separable structure, this policy remains epsilon-optimal, meaning that the loss in the average reward is of order O(epsilon).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
