# Convergence Rates of Posterior Distributions in Markov Decision Process

**Authors:** Zhen Li, Eric Laber

arXiv: 1907.09083 · 2019-07-23

## TL;DR

This paper establishes the convergence rates of posterior distributions in Markov Decision Processes, covering dynamics, rewards, and regret, with extensions to Markov games and practical simulations.

## Contribution

It provides the first comprehensive analysis of posterior convergence rates in MDPs, including for infinite-dimensional parameter spaces and Markov games.

## Key findings

- Posterior distributions of model dynamics converge at quantifiable rates.
- Convergence rates for mean rewards and regret bounds are established.
- A variant of Thompson sampling achieves these convergence and regret guarantees.

## Abstract

In this paper, we show the convergence rates of posterior distributions of the model dynamics in a MDP for both episodic and continuous tasks. The theoretical results hold for general state and action space and the parameter space of the dynamics can be infinite dimensional. Moreover, we show the convergence rates of posterior distributions of the mean accumulative reward under a fixed or the optimal policy and of the regret bound. A variant of Thompson sampling algorithm is proposed which provides both posterior convergence rates for the dynamics and the regret-type bound. Then the previous results are extended to Markov games. Finally, we show numerical results with three simulation scenarios and conclude with discussions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.09083/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1907.09083/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1907.09083/full.md

---
Source: https://tomesphere.com/paper/1907.09083