Portfolio management based on value distribution reinforcement learning algorithm

Yan Yang; Tian Wang; Yiding Fu; Jingna Huang; Dong Zhou

PMC · DOI:10.3389/frai.2025.1709493·January 30, 2026

Portfolio management based on value distribution reinforcement learning algorithm

Yan Yang, Tian Wang, Yiding Fu, Jingna Huang, Dong Zhou

PDF

Open Access

TL;DR

This paper introduces a new reinforcement learning method for managing investment portfolios that improves returns and manages risk better than existing approaches.

Contribution

A novel portfolio management framework using a value distribution maximum entropy actor-critic reinforcement learning algorithm.

Findings

01

The VD-MEAC strategy achieved an average return of 2.490 using Chinese stock market data.

02

The method produced a Sharpe ratio of 2.978, outperforming benchmark strategies.

03

The approach effectively balances return maximization and risk control in uncertain financial markets.

Abstract

In the face of high uncertainty and complexity in financial markets, achieving portfolio return maximization while effectively controlling risk remains a critical challenge. We propose a novel portfolio management framework based on the value distribution maximum entropy actor-critic (VD-MEAC) reinforcement learning algorithm. We establish a framework where the agent’s actions represent portfolio weight adjustments and stock factors serve as state observations. For risk management, the critic network learns the complete distribution of future returns. For return enhancement, we incorporate entropy regularization. We conduct extensive experiments using real market data from the Chinese stock market. Results demonstrate that our VD-MEAC strategy achieves an average return of 2.490 and an average Sharpe ratio of 2.978, significantly outperforming benchmark strategies. These results…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

MEAC

Figures12

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research