Optimising expectation with guarantees for window mean payoff in Markov decision processes
Pranshu Gaba, Shibashis Guha

TL;DR
This paper develops algorithms for synthesising strategies in Markov decision processes that optimise window mean-payoff with guarantees, providing complexity results and strategy types needed for different guarantee levels.
Contribution
It introduces methods for maximizing window mean-payoff in MDPs with various guarantees, analyzing complexity and strategy requirements for each case.
Findings
All three guarantee problems are in PTIME for fixed window mean-payoff.
Problems are in NP ∩ coNP for bounded window mean-payoff.
Pure finite-memory strategies suffice for certain guarantees, while randomized strategies are needed for probabilistic guarantees.
Abstract
The window mean-payoff objective strengthens the classical mean-payoff objective by computing the mean-payoff over a finite window that slides along an infinite path. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. In this paper, we look at the problem of synthesising strategies in Markov decision processes that maximise the window mean-payoff value in expectation, while also simultaneously guaranteeing that the value is above a certain threshold. We solve the synthesis problem for three different kinds of guarantees: sure (that needs to be satisfied in the worst-case, that is, for an adversarial environment), almost-sure (that needs to be satisfied with probability one), and probabilistic (that needs to be satisfied with at least some given probability ). We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
