A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
Shuai Ma, Xiaoteng Ma, and Li Xia

TL;DR
This paper introduces a unified algorithm framework for risk-averse mean-variance optimization in discounted MDPs, addressing time inconsistency issues with a pseudo mean approach and demonstrating convergence and practical effectiveness.
Contribution
It proposes a novel pseudo mean method and a unified bilevel optimization framework for mean-variance MDPs, enabling convergence analysis and broad applicability.
Findings
The framework unifies various variance-related optimization algorithms.
The proposed value iteration algorithm converges to a local optimum.
Numerical experiments validate the algorithm's effectiveness in portfolio management.
Abstract
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property -- time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Advanced Bandit Algorithms Research · Economic theories and models
