Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems
Kunal Pattanayak, Vikram Krishnamurthy

TL;DR
This paper develops an IRL framework for Bayesian stopping time problems, providing necessary and sufficient conditions for action optimality, and demonstrates its effectiveness on theoretical examples and real-world YouTube data.
Contribution
It introduces a novel IRL method with set-valued cost function estimates for Bayesian stopping problems, using Bayesian revealed preferences and finite-sample analysis.
Findings
IRL can accurately identify optimality in Bayesian stopping scenarios
The method successfully predicts user engagement on YouTube datasets
Finite-sample bounds ensure reliability of the IRL detection algorithm
Abstract
This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function.To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Smart Grid Energy Management
