Multiple Stopping Time POMDPs: Structural Results & Application in Interactive Advertising in Social Media
Vikram Krishnamurthy, Anup Aprem, Sujay Bhatt

TL;DR
This paper develops a structural framework for optimal multiple stopping policies in noisy Markov chain settings, with applications to social media advertising, demonstrating improved scheduling performance over existing methods.
Contribution
It introduces threshold-based structural results for POMDPs with multiple stopping times and proposes a stochastic gradient algorithm for policy approximation.
Findings
Optimal policies are characterized by threshold curves in the Bayesian posterior space.
The stopping sets exhibit a nested structure, simplifying policy design.
Applying the framework to social media advertising improves scheduling effectiveness.
Abstract
This paper considers a multiple stopping time problem for a Markov chain observed in noise, where a decision maker chooses at most L stopping times to maximize a cumulative objective. We formulate the problem as a Partially Observed Markov Decision Process (POMDP) and derive structural results for the optimal multiple stopping policy. The main results are as follows: i) The optimal multiple stopping policy is shown to be characterized by threshold curves in the unit simplex of Bayesian Posteriors. ii) The stopping setsl (defined by the threshold curves) are shown to exhibit a nested structure. iii) The optimal cumulative reward is shown to be monotone with respect to the copositive ordering of the transition matrix. iv) A stochastic gradient algorithm is provided for estimating linear threshold policies by exploiting the structural results. These linear threshold policies approximate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Advanced Bandit Algorithms Research · Advanced Wireless Network Optimization
