Markov Decision Processes with Multiple Long-run Average Objectives

Tom\'a\v{s} Br\'azdil (Faculty of Informatics; Masaryk University),; V\'aclav Bro\v{z}ek (Faculty of Informatics; Masaryk University); Krishnendu; Chatterjee (IST Austria); Vojt\v{e}ch Forejt (Department of Computer Science,; Oxford University); Anton\'in Ku\v{c}era (Faculty of Informatics; Masaryk; University)

arXiv:1104.3489·cs.GT·July 1, 2015·LoG

Markov Decision Processes with Multiple Long-run Average Objectives

Tom\'a\v{s} Br\'azdil (Faculty of Informatics, Masaryk University),, V\'aclav Bro\v{z}ek (Faculty of Informatics, Masaryk University), Krishnendu, Chatterjee (IST Austria), Vojt\v{e}ch Forejt (Department of Computer Science,, Oxford University)

PDF

TL;DR

This paper analyzes Markov decision processes with multiple long-run average objectives, revealing the complexity of strategies needed and providing polynomial-time solutions for decision problems and Pareto curve approximation.

Contribution

It introduces a comprehensive analysis of strategies for MDPs with multiple mean-payoff functions, correcting previous flaws and offering new polynomial-time algorithms.

Findings

01

Randomization and memory are necessary for expectation objectives.

02

Infinite memory is required for certain satisfaction objectives.

03

Decision problems can be solved in polynomial time with epsilon-approximation.

Abstract

We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.