History-dependent evaluations in POMDPs

Xavier Venel; Bruno Ziliotto

arXiv:2004.08844·math.OC·April 21, 2020·1 cites

History-dependent evaluations in POMDPs

Xavier Venel, Bruno Ziliotto

PDF

Open Access

TL;DR

This paper studies POMDPs where payoffs depend on past signals and actions, proving the existence of epsilon-optimal strategies under certain conditions, thus unifying previous results and extending to limsup payoffs.

Contribution

It introduces a general framework for history-dependent evaluations in POMDPs and proves the existence of epsilon-optimal strategies for patient decision-makers.

Findings

01

Existence of epsilon-optimal strategies for all epsilon>0.

02

Unification of previous results on POMDP evaluations.

03

Applicability to POMDPs with limsup payoffs.

Abstract

We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Reinforcement Learning in Robotics · Machine Learning and Algorithms