Pathwise uniform value in gambling houses and Partially Observable   Markov Decision Processes

Xavier Venel; Bruno Ziliotto

arXiv:1505.07495·math.OC·September 9, 2015

Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Xavier Venel, Bruno Ziliotto

PDF

Open Access

TL;DR

This paper establishes the existence of a robust pathwise uniform value in gambling houses, MDPs, and POMDPs, demonstrating that decision-makers can adopt pure strategies that are nearly optimal in long-term and finite-horizon settings.

Contribution

It proves the existence of a pathwise uniform value in standard dynamic programming models, resolving open problems about strategy optimality and robustness.

Findings

01

Pure epsilon-optimal strategies exist for large enough n

02

Strategies can be chosen to outperform the liminf of finite-horizon values

03

Results apply to gambling houses, MDPs, and POMDPs

Abstract

In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic theories and models · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research