Entropy-regularized penalization schemes and reflected BSDEs with singular generators
Daniel Chee, Noufel Frikha, Libo Li

TL;DR
This paper introduces an entropy-regularized penalization scheme for continuous-time optimal stopping problems, providing a smooth approximation of the stopping rule, and analyzes its convergence to solutions of reflected BSDEs with singular generators.
Contribution
It proposes a novel entropy-regularized penalization method for reflected BSDEs, with theoretical analysis and numerical validation in low-dimensional settings.
Findings
The scheme promotes exploration and enables gradient-based learning.
Convergence to solutions of reflected BSDEs with singular generators is established.
Numerical experiments demonstrate the scheme's effectiveness in low-dimensional cases.
Abstract
This paper extends our previous work to continuous-time optimal stopping, focusing on American options in an exploratory setting. Our first contribution is an entropy-regularized penalization scheme, inspired by classical penalization techniques for reflected BSDEs. It yields a smooth approximation of the stopping rule, promotes exploration, and enables gradient-based learning methods. We prove well-posedness, convergence, and illustrate numerical performance in low-dimensional examples. Our second contribution analyzes the behaviour of the scheme as the penalization parameter grows, showing that the limit solves a reflected BSDE with a logarithmically singular generator, for which we establish existence and uniqueness via a monotone limit argument.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Extremum Seeking Control Systems · Reinforcement Learning in Robotics
