# A convex programming approach for discrete-time Markov decision   processes under the expected total reward criterion

**Authors:** F. Dufour (CQFD), Alexandre Genadot (CQFD)

arXiv: 1903.08853 · 2019-05-10

## TL;DR

This paper introduces a convex programming approach for constrained discrete-time Markov decision processes with Borel spaces, establishing the equivalence of optimal values and policies under the expected total reward criterion.

## Contribution

It formulates a convex programming model for constrained MDPs with Borel spaces and proves the existence of stationary optimal policies under weak assumptions.

## Key findings

- Convex programming formulation matches the constrained MDP's optimal value.
- Existence of stationary randomized policies for optimal solutions.
- Supremum of expected total rewards over randomized policies equals that over stationary policies.

## Abstract

In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the same form of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDPs. It will be shown that the values of the constrained control problem and the associated convex program coincide and that if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the expected total rewards over the set of randomized policies is equal to the supremum of the expected total rewards over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. An example is presented to illustrate our results with respect to those of the literature.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.08853/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1903.08853/full.md

---
Source: https://tomesphere.com/paper/1903.08853