Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement   Learning Framework

Haoran Wang; Xun Yu Zhou

arXiv:1904.11392·q-fin.PM·May 7, 2019·28 cites

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

Haoran Wang, Xun Yu Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reinforcement learning framework for continuous-time mean-variance portfolio selection, deriving optimal Gaussian policies and demonstrating superior performance over existing methods through simulations.

Contribution

It formulates the MV problem as an entropy-regularized stochastic control, proves the Gaussian nature of optimal policies, and develops an RL algorithm with improved results.

Findings

01

Optimal policies are Gaussian with time-decaying variance.

02

The RL algorithm outperforms existing adaptive control and neural network methods.

03

Connections established between entropy-regularized and classical mean-variance problems.

Abstract

We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tdjaaleb/Exploratory-Mean-Variance
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization