# Computing monotone policies for Markov decision processes: a   nearly-isotonic penalty approach

**Authors:** Robert Mattila, Cristian R. Rojas, Vikram Krishnamurthy, Bo, Wahlberg

arXiv: 1704.00621 · 2017-04-04

## TL;DR

This paper introduces a novel two-stage convex optimization method leveraging nearly-isotonic regularization to efficiently compute monotone policies in Markov decision processes, significantly accelerating the solution process.

## Contribution

It proposes a new alternating convex optimization scheme that exploits monotonicity in MDPs using nearly-isotonic regression, enhancing computational efficiency.

## Key findings

- ADMM can be significantly accelerated with the regularization step.
- The proposed method outperforms traditional approaches in numerical simulations.
- Monotone policies can be efficiently computed using the two-stage scheme.

## Abstract

This paper discusses algorithms for solving Markov decision processes (MDPs) that have monotone optimal policies. We propose a two-stage alternating convex optimization scheme that can accelerate the search for an optimal policy by exploiting the monotone property. The first stage is a linear program formulated in terms of the joint state-action probabilities. The second stage is a regularized problem formulated in terms of the conditional probabilities of actions given states. The regularization uses techniques from nearly-isotonic regression. While a variety of iterative method can be used in the first formulation of the problem, we show in numerical simulations that, in particular, the alternating method of multipliers (ADMM) can be significantly accelerated using the regularization step.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.00621/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1704.00621/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1704.00621/full.md

---
Source: https://tomesphere.com/paper/1704.00621