Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints
Jan K\v{r}et\'insk\'y, Guillermo A. P\'erez, Jean-Fran\c{c}ois Raskin

TL;DR
This paper develops online learning strategies for unknown MDPs to maximize mean-payoff while satisfying parity constraints, offering guarantees with finite or infinite memory depending on the scenario.
Contribution
It introduces novel online learning strategies for unknown MDPs that balance mean-payoff optimization with parity constraints, with proven guarantees and tight bounds.
Findings
Finite-memory strategies achieve near-optimal mean-payoff with high probability.
Infinite-memory strategies ensure parity objectives surely while optimizing mean-payoff.
Guarantees are proven to be tight, with no stronger guarantees possible in some cases.
Abstract
We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all and we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an -optimal mean payoff with probability at least . (ii) Alternatively, for all and there exists an online-learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
