Mean-Variance Optimization of Discrete Time Discounted Markov Decision Processes
Li Xia

TL;DR
This paper introduces a novel approach to minimize variance in infinite horizon discounted Markov decision processes by transforming the problem into an unconstrained MDP and developing a policy iteration algorithm, with proven optimality of deterministic policies.
Contribution
It presents a new method to handle mean-variance optimization in MDPs by decomposing the policy space and reformulating the problem, enabling effective variance minimization.
Findings
The variance difference formula quantifies policy impacts on variance.
The policy iteration algorithm converges to the optimal policy.
Deterministic policies are proven optimal over randomized ones in this setting.
Abstract
In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance. Different from most of works in the literature which require the mean performance already achieve optimum, we can let the mean discounted performance equal any constant. The difficulty of this problem is caused by the quadratic form of the variance function which makes the variance minimization problem not a standard MDP. By proving the decomposable structure of the feasible policy space, we transform this constrained variance minimization problem to an equivalent unconstrained MDP under a new discounted criterion and a new reward function. The difference of the variances of Markov chains under any two feasible policies is quantified by a difference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Control Systems Optimization
