Learning-Based Mean-Payoff Optimization in an Unknown MDP under   Omega-Regular Constraints

Jan K\v{r}et\'insk\'y; Guillermo A. P\'erez; Jean-Fran\c{c}ois Raskin

arXiv:1804.08924·cs.AI·August 24, 2018

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Jan K\v{r}et\'insk\'y, Guillermo A. P\'erez, Jean-Fran\c{c}ois Raskin

PDF

TL;DR

This paper develops online learning strategies for unknown MDPs to maximize mean-payoff while satisfying parity constraints, offering guarantees with finite or infinite memory depending on the scenario.

Contribution

It introduces novel online learning strategies for unknown MDPs that balance mean-payoff optimization with parity constraints, with proven guarantees and tight bounds.

Findings

01

Finite-memory strategies achieve near-optimal mean-payoff with high probability.

02

Infinite-memory strategies ensure parity objectives surely while optimizing mean-payoff.

03

Guarantees are proven to be tight, with no stronger guarantees possible in some cases.

Abstract

We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all $ϵ$ and $γ$ we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an $ϵ$ -optimal mean payoff with probability at least $1 - γ$ . (ii) Alternatively, for all $ϵ$ and $γ$ there exists an online-learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.