History-dependent evaluations in POMDPs
Xavier Venel, Bruno Ziliotto

TL;DR
This paper studies POMDPs where payoffs depend on past signals and actions, proving the existence of epsilon-optimal strategies under certain conditions, thus unifying previous results and extending to limsup payoffs.
Contribution
It introduces a general framework for history-dependent evaluations in POMDPs and proves the existence of epsilon-optimal strategies for patient decision-makers.
Findings
Existence of epsilon-optimal strategies for all epsilon>0.
Unification of previous results on POMDP evaluations.
Applicability to POMDPs with limsup payoffs.
Abstract
We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Reinforcement Learning in Robotics · Machine Learning and Algorithms
