A New DAPO Algorithm for Stock Trading

Ruijian Zha; Bojun Liu

arXiv:2505.06408·cs.CE·May 27, 2025

A New DAPO Algorithm for Stock Trading

Ruijian Zha, Bojun Liu

PDF

1 Repo

TL;DR

This paper introduces a novel reinforcement learning trading algorithm that integrates an improved policy optimization method with large language model signals, achieving high returns and efficiency on NASDAQ-100 data.

Contribution

It presents an enhanced DAPO-inspired algorithm combined with LLM-based signals, demonstrating improved performance and reduced training time in stock trading.

Findings

01

Cumulative return of 230.49% on NASDAQ-100

02

Training time reduced from 8 hours to 2.5 hours

03

Outperforms baseline models in return and efficiency

Abstract

Recent advances in reinforcement learning, such as Dynamic Sampling Policy Optimization (DAPO), show strong performance when paired with large language models (LLMs). Motivated by this success, we ask whether similar gains can be realized in financial trading. We design a trading agent that combines an improved Group Relative Policy Optimization (GRPO) algorithm, augmented with ideas from DAPO, with LLM-based risk and sentiment signals extracted from financial news. On the NASDAQ-100 index (FNSPID dataset), our agent attains a cumulative return of 230.49 percent and an information ratio of 0.37, outperforming the CPPO-DeepSeek baseline. It also cuts training time from about 8 hours to 2.5 hours over 100 epochs while markedly reducing RAM usage. The proposed RL-LLM framework offers a scalable path toward data-efficient trading agents. Code: https://github.com/Ruijian-Zha/FinRL-DAPO-SR/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruijian-zha/finrl-dapo-sr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.