# Portfolio management based on value distribution reinforcement learning algorithm

**Authors:** Yan Yang, Tian Wang, Yiding Fu, Jingna Huang, Dong Zhou

PMC · DOI: 10.3389/frai.2025.1709493 · 2026-01-30

## TL;DR

This paper introduces a new reinforcement learning method for managing investment portfolios that improves returns and manages risk better than existing approaches.

## Contribution

A novel portfolio management framework using a value distribution maximum entropy actor-critic reinforcement learning algorithm.

## Key findings

- The VD-MEAC strategy achieved an average return of 2.490 using Chinese stock market data.
- The method produced a Sharpe ratio of 2.978, outperforming benchmark strategies.
- The approach effectively balances return maximization and risk control in uncertain financial markets.

## Abstract

In the face of high uncertainty and complexity in financial markets, achieving portfolio return maximization while effectively controlling risk remains a critical challenge.

We propose a novel portfolio management framework based on the value distribution maximum entropy actor-critic (VD-MEAC) reinforcement learning algorithm. We establish a framework where the agent’s actions represent portfolio weight adjustments and stock factors serve as state observations. For risk management, the critic network learns the complete distribution of future returns. For return enhancement, we incorporate entropy regularization.

We conduct extensive experiments using real market data from the Chinese stock market. Results demonstrate that our VD-MEAC strategy achieves an average return of 2.490 and an average Sharpe ratio of 2.978, significantly outperforming benchmark strategies.

These results validate the effectiveness of our approach in practical portfolio management scenarios.

## Full-text entities

- **Chemicals:** MEAC (-)

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12903270/full.md

---
Source: https://tomesphere.com/paper/PMC12903270