Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes
Xavier Venel, Bruno Ziliotto

TL;DR
This paper establishes the existence of a robust pathwise uniform value in gambling houses, MDPs, and POMDPs, demonstrating that decision-makers can adopt pure strategies that are nearly optimal in long-term and finite-horizon settings.
Contribution
It proves the existence of a pathwise uniform value in standard dynamic programming models, resolving open problems about strategy optimality and robustness.
Findings
Pure epsilon-optimal strategies exist for large enough n
Strategies can be chosen to outperform the liminf of finite-horizon values
Results apply to gambling houses, MDPs, and POMDPs
Abstract
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic theories and models · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
