Exploratory Optimal Stopping: A Singular Control Formulation
Jodi Dianetti, Giorgio Ferrari, Renyuan Xu

TL;DR
This paper formulates optimal stopping as a singular control problem with exploration incentives, introducing RL algorithms for solving it and providing guarantees for their performance.
Contribution
It presents a novel singular control formulation for exploratory optimal stopping and develops scalable RL algorithms with theoretical guarantees.
Findings
Regularized stopping problem encourages exploration.
Unique optimal strategy identified via dynamic programming.
Model-free RL method is scalable with neural networks.
Abstract
This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time-specifically, a bounded, non-decreasing, c\`adl\`ag control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing the performance criterion with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel, where the regularized free boundary becomes the graph of a function mapping the state variable of the original stopping problem into the probability of stopping. We address this singular control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Optimization and Search Problems
MethodsEntropy Regularization
