Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification

Rui Sun; Yifan Sun; Sheng Xu; Li Zhao; Jing Li; Daxin Jiang; Cheng Hua; Zuo Bai

arXiv:2601.03948·cs.AI·January 9, 2026

Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification

Rui Sun, Yifan Sun, Sheng Xu, Li Zhao, Jing Li, Daxin Jiang, Cheng Hua, Zuo Bai

PDF

Open Access

TL;DR

This paper introduces Trade-R1, a framework that enhances reinforcement learning in stochastic financial environments by verifying reasoning processes, reducing reward hacking, and improving decision accuracy through structured reasoning verification and novel reward strategies.

Contribution

Trade-R1 presents a new process-level reasoning verification method and reward integration strategies to improve RL performance in noisy, stochastic financial markets.

Findings

01

Trade-R1 reduces reward hacking in financial decision RL tasks.

02

Dynamically weighted Semantic Reward (DSR) improves cross-market generalization.

03

Structured reasoning verification enhances decision validity in stochastic environments.

Abstract

Reinforcement Learning (RL) has enabled Large Language Models (LLMs) to achieve remarkable reasoning in domains like mathematics and coding, where verifiable rewards provide clear signals. However, extending this paradigm to financial decision is challenged by the market's stochastic nature: rewards are verifiable but inherently noisy, causing standard RL to degenerate into reward hacking. To address this, we propose Trade-R1, a model training framework that bridges verifiable rewards to stochastic environments via process-level reasoning verification. Our key innovation is a verification method that transforms the problem of evaluating reasoning over lengthy financial documents into a structured Retrieval-Augmented Generation (RAG) task. We construct a triangular consistency metric, assessing pairwise alignment between retrieved evidence, reasoning chains, and decisions to serve as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Explainable Artificial Intelligence (XAI) · Topic Modeling