Beyond discounted returns: Robust Markov decision processes with average   and Blackwell optimality

Julien Grand-Cl\'ement; Marek Petrik; Nicolas Vieille

arXiv:2312.03618·math.OC·January 15, 2025·1 cites

Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

Julien Grand-Cl\'ement, Marek Petrik, Nicolas Vieille

PDF

Open Access

TL;DR

This paper advances the theory of Robust Markov Decision Processes by exploring average and Blackwell optimality, revealing new existence results, and proposing algorithms, with implications for decision-making under uncertainty.

Contribution

It provides foundational results for RMDPs beyond discounted returns, including existence conditions for average and Blackwell optimal policies, and introduces algorithms leveraging stochastic game connections.

Findings

01

Average optimal policies can be stationary and deterministic for sa-rectangular RMDPs.

02

Average optimal policies may not exist or may need to be history-dependent for s-rectangular RMDPs.

03

Epsilon-Blackwell optimal policies always exist under certain conditions.

Abstract

Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to ). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, we show that for s-rectangular RMDPs average optimal policies may not exist, and if they exist, may need to be history-dependent (Markovian). We also study Blackwell optimality for sa-rectangular RMDPs, where we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management · Reinforcement Learning in Robotics · Auction Theory and Applications