Mean-Variance Optimization and Algorithm for Finite-Horizon Markov Decision Processes
Li Xia, Zhihui Yu

TL;DR
This paper introduces a novel approach for finite-horizon mean-variance optimization in Markov decision processes, transforming the problem into a bilevel MDP and proposing an iterative algorithm with convergence guarantees.
Contribution
It develops a bilevel MDP framework for mean-variance optimization and proposes an efficient iterative algorithm with convergence analysis, applicable to various MDP-based problems.
Findings
The algorithm converges to a local optimum.
A sufficient condition for global optimality is derived.
Application to portfolio optimization matches classical results.
Abstract
Multi-period mean-variance optimization is a long-standing problem, caused by the failure of dynamic programming principle. This paper studies the mean-variance optimization in a setting of finite-horizon discrete-time Markov decision processes (MDPs), where the objective is to maximize the combined metrics of mean and variance of the accumulated rewards at terminal stage. By introducing the concepts of pseudo mean and pseudo variance, we convert the original mean-variance MDP to a bilevel MDP, where the outer is a single parameter optimization of the pseudo mean and the inner is a standard finite-horizon MDP with an augmented state space by adding an auxiliary state of accumulated rewards. We further study the properties of this bilevel MDP, including the optimality of history-dependent deterministic policies and the piecewise quadratic concavity of the inner MDPs' optimal values with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
