Reinforcement Learning for Optimal Stopping in POMDPs with Application to Quickest Change Detection

Austin Cooper; Sean Meyn

arXiv:2512.22347·math.OC·December 30, 2025

Reinforcement Learning for Optimal Stopping in POMDPs with Application to Quickest Change Detection

Austin Cooper, Sean Meyn

PDF

Open Access

TL;DR

This paper applies reinforcement learning, specifically Q-learning, to the optimal stopping problem in POMDPs for quickest change detection, providing a new approach with convergence guarantees and practical effectiveness.

Contribution

It introduces a Q-learning algorithm for partially observed optimal stopping problems, with convergence analysis and application to quickest change detection.

Findings

01

Q-learning converges under linear function approximation

02

The proposed policies are near-optimal in several scenarios

03

Numerical experiments demonstrate effective performance close to theoretical best

Abstract

The field of quickest change detection (QCD) focuses on the design and analysis of online algorithms that estimate the time at which a significant event occurs. In this paper, design and analysis are cast in a Bayesian framework, where QCD is formulated as an optimal stopping problem with partial observations. An approximately optimal detection algorithm is sought using techniques from reinforcement learning. The contributions of the paper are summarized as follows: (i) A Q-learning algorithm is proposed for the general partially observed optimal stopping problem. It is shown to converge under linear function approximation, given suitable assumptions on the basis functions. An example is provided to demonstrate that these assumptions are necessary to ensure algorithmic stability. (ii) Prior theory motivates a particular choice of features in applying Q-learning to QCD. It is shown that,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Process Monitoring · Optimization and Search Problems · Advanced Bandit Algorithms Research