The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning

Yu Li; Yuhan Wu; Shuhua Zhang

arXiv:2505.07537·q-fin.MF·May 13, 2025

The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning

Yu Li, Yuhan Wu, Shuhua Zhang

PDF

Open Access

TL;DR

This paper applies a reinforcement learning approach, specifically the soft actor-critic algorithm, to optimize multi-asset portfolios in dynamic markets, demonstrating improved stability and accuracy through a novel parameter division strategy.

Contribution

It introduces a new RL-based method for multi-asset mean-variance portfolio selection with proven convergence and enhanced stability via parameter partitioning.

Findings

01

The SAC algorithm outperforms traditional methods in simulated markets.

02

The proposed approach achieves higher stability and learning accuracy.

03

Numerical results confirm superior performance in real financial markets.

Abstract

In this paper, we study the continuous-time multi-asset mean-variance (MV) portfolio selection using a reinforcement learning (RL) algorithm, specifically the soft actor-critic (SAC) algorithm, in the time-varying financial market. A family of Gaussian portfolio selections is derived, and a policy iteration process is crafted to learn the optimal exploratory portfolio selection. We prove the convergence of the policy iteration process theoretically, based on which the SAC algorithm is developed. To improve the algorithm's stability and the learning accuracy in the multi-asset scenario, we divide the model parameters that influence the optimal portfolio selection into three parts, and learn each part progressively. Numerical studies in the simulated and real financial markets confirm the superior performance of the proposed SAC algorithm under various criteria.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Stock Market Forecasting Methods

MethodsAverage Pooling · Global Average Pooling · Convolution · 1x1 Convolution · Dilated Convolution · Switchable Atrous Convolution