Continuous-time q-learning for Markov regime switching system under Tsallis entropy

Minghui Zhang; Xun Li; Xin Zhang

arXiv:2601.19299·math.OC·January 28, 2026

Continuous-time q-learning for Markov regime switching system under Tsallis entropy

Minghui Zhang, Xun Li, Xin Zhang

PDF

Open Access

TL;DR

This paper introduces continuous-time q-learning algorithms for Markov regime-switching systems under Tsallis entropy, addressing limitations of traditional RL methods and demonstrating their effectiveness in portfolio optimization.

Contribution

It develops novel continuous-time q-learning algorithms under Tsallis entropy and applies them to regime-switching portfolio optimization, expanding the scope of RL in complex systems.

Findings

01

Algorithms perform well in numerical experiments

02

Effective in regime-switching market portfolio optimization

03

Addresses limitations of traditional RL with Tsallis entropy

Abstract

This paper studies the continuous-time q-learning (the continuous time counterpart of Q-learing) for Markov switching system under Tsallis entropy regularization. We address the difficulty in traditional RL algorithms where the Tsallis entropy regularization leads to an optimal policy distribution not necessarily a Gibbs measure, which often complicates algorithm design. Furthermore, to address the limited universality of current continuous time regime-switching RL algorithms (often restricted to the EMV framework), this study focuses on continuous-time q-learning for Markov regime-switching systems based on Tsallis entropy, aiming for a more universally applicable continuous-time RL method. We establish the martingale characterization of the q-function under Tsallis entropy for continuous-time Markov regime-switching systems. Based on this, we design two q-learning algorithms,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research