Epsilon-Optimal Policies for Average-Cost Separable MDPs with Perturbations

Dhairya Kantawala

arXiv:2510.23335·math.OC·October 28, 2025

Epsilon-Optimal Policies for Average-Cost Separable MDPs with Perturbations

Dhairya Kantawala

PDF

TL;DR

This paper develops epsilon-optimal policies for average-cost Markov Decision Processes with nearly separable reward and transition structures, demonstrating robustness of the optimal policy under small perturbations.

Contribution

It provides explicit stationary policies for separable MDPs and proves their epsilon-optimality persists under epsilon-perturbations.

Findings

01

Explicit stationary decision rule for separable MDPs

02

Epsilon-optimality of the policy under perturbations

03

Bound on average reward loss proportional to epsilon

Abstract

We study a class of infinite-horizon average-cost Markov Decision Processes (MDPs) whose reward and transition structures are nearly separable. For the totally separable baseline (that is, with no perturbation), we derive an explicit stationary decision rule that is exactly average-optimal. We then show that under an epsilon-perturbation of the separable structure, this policy remains epsilon-optimal, meaning that the loss in the average reward is of order O(epsilon).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.