A Reinforcement Learning Framework for Some Singular Stochastic Control Problems
Zongxia Liang, Xiaodong Luo, Xiang Yu

TL;DR
This paper introduces a continuous-time reinforcement learning framework for singular stochastic control problems, characterizing optimal controls as regions in time and state space, and devising q-learning algorithms for their identification.
Contribution
It generalizes policy evaluation theories to singular controls, introduces q-functions for model-free learning, and develops algorithms with theoretical guarantees.
Findings
Developed a policy improvement theorem via region iteration.
Established martingale characterization for q-functions and value function.
Presented numerical experiments demonstrating the proposed algorithms.
Abstract
We develop a continuous-time reinforcement learning framework for a class of singular stochastic control problems without entropy regularization. The optimal singular control is characterized as the optimal singular control law, which is a pair of regions of time and the augmented states. The goal of learning is to identify such an optimal region via the trial-and-error procedure. In this context, we generalize the existing policy evaluation theories with regular controls to learn our optimal singular control law and develop a policy improvement theorem via the region iteration. To facilitate the model-free policy iteration procedure, we further introduce the zero-order and first-order q-functions arising from singular control problems and establish the martingale characterization for the pair of q-functions together with the value function. Based on our theoretical findings, some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
