Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study
Yilie Huang, Yanwei Jia, Xun Yu Zhou

TL;DR
This paper introduces a reinforcement learning approach for continuous-time mean-variance portfolio selection that learns investment strategies directly from data without estimating market coefficients, demonstrating strong empirical performance.
Contribution
It develops a novel data-driven RL algorithm for portfolio optimization in continuous time, with theoretical regret guarantees and extensive empirical validation.
Findings
The RL strategy outperforms traditional methods in volatile markets.
The algorithm achieves a sublinear regret bound in terms of the Sharpe ratio.
Empirical results show consistent outperformance on S&P 500 data.
Abstract
We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes, yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL approach that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black--Scholes markets without factors, we further devise an algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of the Sharpe ratio. We then carry out an extensive empirical study implementing this algorithm to compare its performance and trading characteristics, evaluated under a host of common metrics, with a large number of widely employed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
