# Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

**Authors:** Krishnendu Chatterjee, Raimundo Saona, Bruno Ziliotto

arXiv: 1904.13360 · 2022-09-29

## TL;DR

This paper demonstrates that in POMDPs with long-run average objectives, decision makers can use finite-memory strategies to achieve near-optimal results, enabling recursive approximation of the long-term value.

## Contribution

It proves the existence of finite-memory approximately optimal strategies in POMDPs with long-run average objectives, a significant theoretical advancement.

## Key findings

- Finite-memory strategies are approximately optimal in POMDPs with long-run average objectives.
- The long-run value approximation is recursively enumerable.
- The value function exhibits a weak continuity property with respect to transition functions.

## Abstract

Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.13360/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/1904.13360/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1904.13360/full.md

---
Source: https://tomesphere.com/paper/1904.13360