Mean-Variance Optimization in Markov Decision Processes
Shie Mannor, John Tsitsiklis

TL;DR
This paper studies mean-variance optimization in finite horizon Markov decision processes, revealing computational hardness and providing algorithms for optimal and approximate solutions.
Contribution
It introduces the complexity results for mean-variance optimization in MDPs and offers pseudopolynomial algorithms for exact and approximate solutions.
Findings
Maximizing mean reward under variance constraints is NP-hard.
Randomized and history-based policies can enhance performance.
Provided pseudopolynomial algorithms for solving the problem.
Abstract
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
