Randomized Policy Optimization for Optimal Stopping
Xinyi Guan, Velibor V. Mi\v{s}i\'c

TL;DR
This paper introduces a novel randomized policy approach for optimal stopping problems, providing theoretical guarantees and demonstrating superior performance over existing methods in benchmark experiments.
Contribution
It proposes a new randomized linear policy framework for optimal stopping, with convergence proofs and performance bounds, advancing beyond deterministic policies.
Findings
Outperforms state-of-the-art methods in option pricing benchmarks
Provides theoretical convergence and performance guarantees
Develops a practical heuristic for solving the NP-hard problem
Abstract
Optimal stopping is the problem of determining when to stop a stochastic system in order to maximize reward, which is of practical importance in domains such as finance, operations management and healthcare. Existing methods for high-dimensional optimal stopping that are popular in practice produce deterministic linear policies -- policies that deterministically stop based on the sign of a weighted sum of basis functions -- but are not guaranteed to find the optimal policy within this policy class given a fixed basis function architecture. In this paper, we propose a new methodology for optimal stopping based on randomized linear policies, which choose to stop with a probability that is determined by a weighted sum of basis functions. We motivate these policies by establishing that under mild conditions, given a fixed basis function architecture, optimizing over randomized linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Healthcare Operations and Scheduling Optimization · Optimization and Search Problems
