# Minimizing the Outage Probability in a Markov Decision Process

**Authors:** Vincent Corlay, Jean-Christophe Sibel

arXiv: 2302.14714 · 2023-03-06

## TL;DR

This paper introduces a novel algorithm for Markov decision processes that optimizes the probability of achieving a gain above a certain threshold, extending traditional expected reward optimization.

## Contribution

It presents an extension of value iteration to optimize outage probability and discusses how to incorporate neural networks for scalable solutions.

## Key findings

- Algorithm effectively minimizes outage probability.
- Extension to neural networks enables scalable implementation.
- Potential for improved risk-sensitive decision making.

## Abstract

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The algorithm can be seen as an extension of the value iteration algorithm. We also show how the proposed algorithm could be generalized to use neural networks, similarly to the deep Q learning extension of Q learning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14714/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14714/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/2302.14714/full.md

---
Source: https://tomesphere.com/paper/2302.14714