A unified algorithm framework for mean-variance optimization in   discounted Markov decision processes

Shuai Ma; Xiaoteng Ma; and Li Xia

arXiv:2201.05737·math.OC·January 19, 2022·1 cites

A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

Shuai Ma, Xiaoteng Ma, and Li Xia

PDF

Open Access

TL;DR

This paper introduces a unified algorithm framework for risk-averse mean-variance optimization in discounted MDPs, addressing time inconsistency issues with a pseudo mean approach and demonstrating convergence and practical effectiveness.

Contribution

It proposes a novel pseudo mean method and a unified bilevel optimization framework for mean-variance MDPs, enabling convergence analysis and broad applicability.

Findings

01

The framework unifies various variance-related optimization algorithms.

02

The proposed value iteration algorithm converges to a local optimum.

03

Numerical experiments validate the algorithm's effectiveness in portfolio management.

Abstract

This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property -- time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Advanced Bandit Algorithms Research · Economic theories and models