Necessary and Sufficient Conditions for Inverse Reinforcement Learning   of Bayesian Stopping Time Problems

Kunal Pattanayak; Vikram Krishnamurthy

arXiv:2007.03481·cs.LG·March 29, 2023

Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems

Kunal Pattanayak, Vikram Krishnamurthy

PDF

Open Access

TL;DR

This paper develops an IRL framework for Bayesian stopping time problems, providing necessary and sufficient conditions for action optimality, and demonstrates its effectiveness on theoretical examples and real-world YouTube data.

Contribution

It introduces a novel IRL method with set-valued cost function estimates for Bayesian stopping problems, using Bayesian revealed preferences and finite-sample analysis.

Findings

01

IRL can accurately identify optimality in Bayesian stopping scenarios

02

The method successfully predicts user engagement on YouTube datasets

03

Finite-sample bounds ensure reliability of the IRL detection algorithm

Abstract

This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed strategies. Our IRL algorithm identifies optimality and then constructs set-valued estimates of the cost function.To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. As a real-world example, we illustrate using a YouTube dataset comprising metadata from 190000 videos how the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Smart Grid Energy Management