Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy
Lijun Bo, Yijie Huang, Xiang Yu, Tingting Zhang

TL;DR
This paper develops continuous-time q-learning algorithms for jump-diffusion models using Tsallis entropy, providing explicit policy characterizations and demonstrating their effectiveness in financial and control problems.
Contribution
It introduces novel q-learning algorithms under Tsallis entropy in continuous time, including explicit policy characterization and actor-critic methods for jump-diffusion models.
Findings
Optimal policies are explicitly characterized as distributions with compact support.
The proposed algorithms perform well in dark pool liquidation and control problems.
Tsallis entropy regularization leads to non-Gibbs optimal policies.
Abstract
This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization. Contrary to the Shannon entropy, the general form of Tsallis entropy renders the optimal policy not necessarily a Gibbs measure. Herein, the Lagrange multiplier and KKT condition are needed to ensure that the learned policy is a probability density function. As a consequence, the characterization of the optimal policy using the q-function also involves a Lagrange multiplier. In response, we establish the martingale characterization of the q-function and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not. In the latter case, we consider different parameterizations of the optimal q-function and the optimal policy, and update them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Fractional Differential Equations Solutions · Model Reduction and Neural Networks
MethodsQ-Learning · Entropy Regularization
