Portfolio management based on value distribution reinforcement learning algorithm
Yan Yang, Tian Wang, Yiding Fu, Jingna Huang, Dong Zhou

TL;DR
This paper introduces a new reinforcement learning method for managing investment portfolios that improves returns and manages risk better than existing approaches.
Contribution
A novel portfolio management framework using a value distribution maximum entropy actor-critic reinforcement learning algorithm.
Findings
The VD-MEAC strategy achieved an average return of 2.490 using Chinese stock market data.
The method produced a Sharpe ratio of 2.978, outperforming benchmark strategies.
The approach effectively balances return maximization and risk control in uncertain financial markets.
Abstract
In the face of high uncertainty and complexity in financial markets, achieving portfolio return maximization while effectively controlling risk remains a critical challenge. We propose a novel portfolio management framework based on the value distribution maximum entropy actor-critic (VD-MEAC) reinforcement learning algorithm. We establish a framework where the agent’s actions represent portfolio weight adjustments and stock factors serve as state observations. For risk management, the critic network learns the complete distribution of future returns. For return enhancement, we incorporate entropy regularization. We conduct extensive experiments using real market data from the Chinese stock market. Results demonstrate that our VD-MEAC strategy achieves an average return of 2.490 and an average Sharpe ratio of 2.978, significantly outperforming benchmark strategies. These results…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research
