# Learning Multiple Markov Chains via Adaptive Allocation

**Authors:** Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

arXiv: 1905.11128 · 2019-11-14

## TL;DR

This paper introduces a new algorithm for learning multiple unknown ergodic Markov chains from sequential observations, balancing exploration and exploitation to achieve uniform learning performance without prior knowledge.

## Contribution

It proposes a novel adaptive allocation algorithm with finite-sample guarantees that asymptotically attains optimal loss in learning multiple Markov chains.

## Key findings

- Algorithm achieves uniform learning performance.
- Finite-sample PAC guarantees provided.
- Asymptotically optimal loss attained.

## Abstract

We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain. We assume that the Markov chains are ergodic but otherwise unknown. The learner can sample Markov chains sequentially to observe their states. The goal of the learner is to sequentially select various chains to learn transition matrices uniformly well with respect to some loss function. We introduce a notion of loss that naturally extends the squared loss for learning distributions to the case of Markov chains, and further characterize the notion of being \emph{uniformly good} in all problem instances. We present a novel learning algorithm that efficiently balances \emph{exploration} and \emph{exploitation} intrinsic to this problem, without any prior knowledge of the chains. We provide finite-sample PAC-type guarantees on the performance of the algorithm. Further, we show that our algorithm asymptotically attains an optimal loss.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.11128/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1905.11128/full.md

---
Source: https://tomesphere.com/paper/1905.11128