Reinforcement Learning for Speculative Trading under Exploratory Framework

Yun Zhao; Alex S.L. Tse; Harry Zheng

arXiv:2604.02035·q-fin.MF·April 3, 2026

Reinforcement Learning for Speculative Trading under Exploratory Framework

Yun Zhao, Alex S.L. Tse, Harry Zheng

PDF

TL;DR

This paper formulates a reinforcement learning approach for speculative trading as an optimal stopping problem, deriving closed-form solutions and demonstrating the method in a pairs-trading application.

Contribution

It introduces a novel RL framework for trading with a relaxed stopping time model, providing theoretical analysis and a practical algorithm.

Findings

01

Closed-form solutions for the exploratory HJB equations.

02

Convergence guarantees for the RL objective.

03

Successful implementation in a pairs-trading scenario.

Abstract

We study a speculative trading problem within the exploratory reinforcement learning (RL) framework of Wang et al. [2020]. The problem is formulated as a sequential optimal stopping problem over entry and exit times under general utility function and price process. We first consider a relaxed version of the problem in which the stopping times are modeled by the jump times of Cox processes driven by bounded, non-randomized intensity controls. Under the exploratory formulation, the agent's randomized control is characterized via the probability measure over the jump intensities, and their objective function is regularized by Shannon's differential entropy. This yields a system of the exploratory HJB equations and Gibbs distributions in closed-form as the optimal policy. Error estimates and convergence of the RL objective to the value function of the original problem are established.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.