Large scale continuous-time mean-variance portfolio allocation via reinforcement learning
Haoran Wang

TL;DR
This paper introduces a reinforcement learning approach for large-scale continuous-time mean-variance portfolio optimization, demonstrating superior performance over traditional methods through extensive empirical testing.
Contribution
It develops a scalable RL algorithm based on a Gaussian feedback policy for high-dimensional portfolio optimization, advancing the application of RL in finance.
Findings
Achieves over 10% annualized returns in empirical tests.
Outperforms econometric and deep RL methods significantly.
Effective for both long-term and medium-term investment strategies.
Abstract
We propose to solve large scale Markowitz mean-variance (MV) portfolio allocation problem using reinforcement learning (RL). By adopting the recently developed continuous-time exploratory control framework, we formulate the exploratory MV problem in high dimensions. We further show the optimality of a multivariate Gaussian feedback policy, with time-decaying variance, in trading off exploration and exploitation. Based on a provable policy improvement theorem, we devise a scalable and data-efficient RL algorithm and conduct large scale empirical tests using data from the S&P 500 stocks. We found that our method consistently achieves over 10% annualized returns and it outperforms econometric methods and the deep RL method by large margins, for both long and medium terms of investment with monthly and daily trading.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Financial Markets and Investment Strategies
