Stateful Offline Contextual Policy Evaluation and Learning

Nathan Kallus; Angela Zhou

arXiv:2110.10081·cs.LG·October 20, 2021

Stateful Offline Contextual Policy Evaluation and Learning

Nathan Kallus, Angela Zhou

PDF

Open Access

TL;DR

This paper develops a framework for offline policy evaluation and learning in structured Markov decision processes with sequential arrivals, enabling better decision-making in resource-constrained, personalized settings.

Contribution

It introduces a novel approach leveraging individual response independence from states for off-policy evaluation, applicable to high-dimensional, real-world problems like pricing and operations.

Findings

01

Improved out-of-sample policy performance in simulations.

02

Analysis of error amplification over time.

03

Sample complexity results for the proposed method.

Abstract

We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions. This model can be thought of as an offline generalization of contextual bandits with resource constraints. We formalize the relevant causal structure of problems such as dynamic personalized pricing and other operations management problems in the presence of potentially high-dimensional user types. The key insight is that an individual-level response is often not causally affected by the state variable and can therefore easily be generalized across timesteps and states. When this is true, we study implications for (doubly robust) off-policy evaluation and learning by instead leveraging single time-step evaluation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransportation and Mobility Innovations · Age of Information Optimization · Advanced Bandit Algorithms Research