Mean-Variance Optimization in Markov Decision Processes

Shie Mannor; John Tsitsiklis

arXiv:1104.5601·cs.LG·May 2, 2011·44 cites

Mean-Variance Optimization in Markov Decision Processes

Shie Mannor, John Tsitsiklis

PDF

Open Access

TL;DR

This paper studies mean-variance optimization in finite horizon Markov decision processes, revealing computational hardness and providing algorithms for optimal and approximate solutions.

Contribution

It introduces the complexity results for mean-variance optimization in MDPs and offers pseudopolynomial algorithms for exact and approximate solutions.

Findings

01

Maximizing mean reward under variance constraints is NP-hard.

02

Randomized and history-based policies can enhance performance.

03

Provided pseudopolynomial algorithms for solving the problem.

Abstract

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems